How to get subtitles for YouTube videos using Python

Sometimes, we need to get transcripts/subtitles of YouTube videos, but to do this, we would have to go to the YouTube video and manually generate the transcript. In Python, we have a package named youtube_transcript_api that can be used to automatically give you a transcript that you can use as plain text.

First, let us install this package by running:

pip install youtube_transcript_api

Now, need the YouTube video id for the transcript we want to generate. In the URL below, the text in green is the video id:

https://www.youtube.com/watch?v=Y8Tko2YC5hA

Code

Now, let’s see the code:

from youtube_transcript_api import YouTubeTranscriptApi
def generate_transcript(id):
transcript = YouTubeTranscriptApi.get_transcript(id)
script = ""
for text in transcript:
t = text["text"]
if t != '[Music]':
script += t + " "
return script, len(script.split())
id = 'Y8Tko2YC5hA'
transcript, no_of_words = generate_transcript(id)
print(transcript)

Explanation

  • In line 1, we import the required package.
  • In line 3, we create the generate_transcript() function, which accepts the video id as a parameter and will return the transcript as well as the number of words in the transcript.
  • In line 4, we use the get_transcript() method of our package that gets the transcript of the id provided as a parameter. This function returns a list of dictionaries, so we need to do some processing to convert it to a single string.
  • In line 7, we run a loop to iterate over all the dictionary values and fetch the text for each time interval. Then, we combine it into a string.
  • In line 9, we added a filter to skip the Music so that, if there is any music in the video, it will not come to our final transcript string.
  • Finally, in line 12, we return the values.
  • In line 15, we call our function by passing the video id.

This package will throw an error if there is no subtitle for the YouTube video for which you passed the video id.

Free Resources