Sometimes, we need to get transcripts/subtitles of YouTube videos, but to do this, we would have to go to the YouTube video and manually generate the transcript. In Python, we have a package named youtube_transcript_api that can be used to automatically give you a transcript that you can use as plain text.
First, let us install this package by running:
pip install youtube_transcript_api
Now, need the YouTube video id for the transcript we want to generate. In the URL below, the text in green is the video id:
https://www.youtube.com/watch?v=Y8Tko2YC5hA
Now, let’s see the code:
from youtube_transcript_api import YouTubeTranscriptApidef generate_transcript(id):transcript = YouTubeTranscriptApi.get_transcript(id)script = ""for text in transcript:t = text["text"]if t != '[Music]':script += t + " "return script, len(script.split())id = 'Y8Tko2YC5hA'transcript, no_of_words = generate_transcript(id)print(transcript)
Explanation
generate_transcript()
function, which accepts the video id
as a parameter and will return the transcript as well as the number of words in the transcript.get_transcript()
method of our package that gets the transcript of the id
provided as a parameter. This function returns a list of dictionaries, so we need to do some processing to convert it to a single string.Music
so that, if there is any music in the video, it will not come to our final transcript string.id
.This package will throw an error if there is no subtitle for the YouTube video for which you passed the video
id
.