Transform YouTube Transcripts into Interactive PDFs with Lyzr Chatbot

Extracting and Analyzing YouTube Transcripts Using Lyzr

This tutorial demonstrates a streamlined method to extract, process, and analyze YouTube video transcripts with Lyzr, an AI-powered framework designed to facilitate interaction with textual data. By integrating Lyzr’s ChatBot with the youtube-transcript-api and FPDF libraries, users can convert video transcripts into structured PDF documents and perform insightful analyses interactively.

Setting Up the Environment

First, essential Python libraries are installed, including lyzr for AI chat capabilities, youtube-transcript-api for transcript extraction, fpdf2 for PDF creation, and ipywidgets for building an interactive chat interface. Additionally, the DejaVu Sans font is installed to ensure full Unicode support in generated PDFs.

!pip install lyzr youtube-transcript-api fpdf2 ipywidgets
!apt-get update -qq && apt-get install -y fonts-dejavu-core

Configuring OpenAI API Access

The OpenAI API key is configured by importing necessary modules and setting environment variables. This setup enables leveraging OpenAI’s language models within the Lyzr framework.

import os
import openai
 
openai.api_key = os.getenv("OPENAI_API_KEY")
os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY_HERE"

Key Libraries for Transcript Processing and PDF Generation

The tutorial imports several libraries: json for data handling, Lyzr's ChatBot for AI-driven interactions, YouTubeTranscriptApi for transcript retrieval, FPDF for PDF creation, ipywidgets for UI components, and re for text processing.

import json
from lyzr import ChatBot
from youtube_transcript_api import YouTubeTranscriptApi, TranscriptsDisabled, NoTranscriptFound, CouldNotRetrieveTranscript
from fpdf import FPDF
from ipywidgets import Textarea, Button, Output, Layout
from IPython.display import display, Markdown
import re

Converting Transcripts to PDFs

The function transcript_to_pdf downloads a YouTube video's transcript and converts it into a well-formatted PDF. It handles exceptions for unavailable transcripts, ensures Unicode support using DejaVuSans font, and processes text to avoid layout issues caused by long words or formatting.

Interactive Chat Interface

The create_interactive_chat function builds an interactive chat interface allowing users to ask questions related to the transcript content. It uses ipywidgets to capture user input and display responses generated by the Lyzr ChatBot.

Main Processing Pipeline

The main function processes a list of YouTube video IDs, converts their transcripts into PDFs, and initializes a Lyzr PDF-chat agent for transcript analysis. It generates summaries, insights, quiz questions, and creative prompts, saving responses into JSON and Markdown formats. Additionally, if multiple transcripts are processed, it compares them to highlight thematic differences. Finally, it launches the interactive chat interface for user engagement.

Practical Applications

This approach is ideal for researchers, educators, and content creators who want to quickly derive meaningful insights, generate summaries, and explore video content interactively. Lyzr’s capabilities enhance productivity by transforming multimedia transcripts into actionable knowledge through AI-driven conversational tools.