Turn your code into any language with our Code Converter. It's the ultimate tool for multi-language programming. Start converting now!
YouTube videos often contain valuable information, but watching an entire video can be time-consuming. What if you could extract the transcript of a video and generate a concise summary? In this tutorial, we will build a Python program that does exactly that! Using pytube
to fetch video details and YouTubeTranscriptApi
to get transcripts, we will process the text using NLTK
and generate a meaningful summary.
By the end of this tutorial, you will learn how to:
NLTK
library and using LLMs via an API.Before writing the script, ensure you have the required dependencies installed. You can install them using pip
:
pytube
: Fetches video metadatayoutube-transcript-api
: Retrieves video transcriptsnltk
: Performs natural language processing (NLP)colorama
: Adds color to terminal outputopenai
for AI-based text summarization using OpenRouter's API.Also, we’re going to be using an OpenAI API-compatible library from OpenRouter. So head on to the website, sign up if you don’t already have an account and create an API key. It’s pretty straightforward. The specific model we’ll be using is Mistral: Mistral Small 3.1 24B (free). It’s a free model so you don’t have to pay. For more insights on how to use the API, just read through the overview section.
After grabbing your API key, open up a Python file, name it meaningfully like youtube_transcript_summarizer.py
and follow along.
First, we import the necessary libraries for this project.
The script begins by importing necessary libraries:
os
for terminal width detection.re
for regular expressions.heapq
for extracting the most significant sentences.textwrap
for text formatting.Next, colorama.init(autoreset=True)
is called to enable cross-platform color support, ensuring that color formatting resets automatically after each output. Check this article if you're interested to learn more about Colorama.
The script downloads necessary nltk
resources (punkt
for tokenization and stopwords
for filtering common words), ensuring they are available without user intervention.
openai.OpenAI(api_key=API_KEY)
initializes the OpenRouter API client, which will be later used for AI-driven summarization. Please include your API key there.
The extract_video_id(youtube_url)
function extracts a YouTube video ID from various URL formats, including standard, shortened (youtu.be
), and embedded formats, raising an error if the ID cannot be determined.
The get_transcript(video_id)
function retrieves the transcript of a video using YouTubeTranscriptApi.get_transcript(video_id)
, handling errors gracefully by returning an error message if the transcript is unavailable.
The summarize_text_nltk(text, num_sentences=5)
function processes the transcript by tokenizing it into sentences, filtering out stopwords, computing word frequencies, and scoring sentences based on their significance using nltk
. It selects the top num_sentences
highest-scoring sentences and returns them in their original order.
This function generates a summary using OpenRouter's mistral
model. It follows these steps:
openai.chat.completions.create()
to generate a summary.Each LLM has its own context window, for this one, it has 96,000 tokens at the time of writing this article, which is roughly ~300k characters, but to avoid overloading the free API, I'm omitting anything that exceeds 15k.
The summarize_youtube_video(youtube_url, num_sentences=5)
function combines the previous steps: it extracts the video ID, retrieves the transcript, summarizes it, and fetches video metadata (title, author, length, publish date, and views) using pytube.YouTube(youtube_url)
. It returns a dictionary containing the video title, summary, and transcript statistics.
The format_time(seconds)
function converts a given number of seconds into a human-readable format like "2h 30m 15s".
The format_number(number)
function formats large numbers with commas (e.g., 1234567
→ 1,234,567
).
The print_boxed_text(text, width=80, title=None, color=Fore.WHITE)
function prints text inside a nicely formatted box with optional color and title.
The print_summary_result(result, width=80)
function displays the final summary result in a structured format. It prints a header with the video title, metadata, and the summarized transcript (for both “AI” and NLTK) with appropriate formatting and spacing. If an error occurs, it prints an error message in a red box.
The if __name__ == "__main__":
block is the script’s entry point. It determines the terminal width for proper formatting, prints a welcome banner, prompts the user for a YouTube URL and desired summary length, then fetches and summarizes the video transcript before displaying the results.
Make sure to install the requirements first:
Now let’s run our code:
It'll prompt you for the YouTube video URL.First result:
Notice that for the AI summary, out of 2100 words in the transcription, our program was able to summarize by approximately 92% to give us 165 words. We can see similar stats for the NLTK summary.
Important Note
The NLTK sentence-based summarization in our program works effectively when the transcript contains punctuation marks like commas and periods to define sentences. However, for transcripts lacking these punctuations, the entire transcript is typically returned.
The script is still beneficial because the AI summary does an excellent job regardless of whether punctuations are included or not. A good example is the transcript result we have above. Well summarised.
I’ll show you what I mean with an example:
When we run our code against another video that does not have punctuations in its transcription, we get:
This is a transcript of a YouTube clip from Mr. Robot, featuring the scene where Elliot hacks Ron, the pedophile. Since the transcript lacks punctuation, NLTK results in 0% condensation, returning the entire transcript to the user but the Mistral
model (LLM Summary) still does an excellent job.
While the summarization feature is useful, it has some limitations.
One limitation is that repeated requests to YouTube for transcripts may result in an IP block. YouTube can detect excessive requests and temporarily restrict access. To avoid this, it is advisable to introduce intervals between successive program runs or simply use the YouTube API.
Additionally, the quality of the summary depends on the quality of the transcript itself. If the transcript is inaccurate or lacks proper formatting, the generated summary may not be meaningful.
Despite its limitations, the summarization feature provides a quick and efficient way to condense YouTube transcripts. By ensuring high-quality transcripts, introducing intervals between runs, and handling punctuation properly, users can maximize the effectiveness of the summarization process. With these considerations, this tool remains a valuable asset for quickly extracting key insights from lengthy video transcripts. Check out our text summarization tutorial if you want to perform text summarization using transformers.
I hope you enjoyed this one. Till next time, Happy Coding!
Found the article interesting? You'll love our Python Code Generator! Give AI a chance to do the heavy lifting for you. Check it out!
View Full Code Convert My Code
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!