From Zero to Chatbot: Build a Modular Conversational AI with Pipecat & HuggingFace

Overview

This tutorial walks through building a modular conversational AI agent using the Pipecat framework and HuggingFace's text-generation pipeline. The example demonstrates how to wire custom FrameProcessor classes together in a Pipeline, simulate user inputs, run asynchronous data flow with PipelineRunner and PipelineTask, and display the resulting conversation. The code uses DialoGPT-small as a simple demo model and shows how Pipecat's frame-based design supports extensibility for future features like speech modules or memory.

Installing and importing

Begin by installing required libraries and importing Pipecat primitives and the HuggingFace pipeline. The following code block shows the exact setup used in the tutorial:

!pip install -q pipecat-ai transformers torch accelerate numpy
 
 
import asyncio
import logging
from typing import AsyncGenerator
import numpy as np
 
 
print(" Checking available Pipecat frames...")
 
 
try:
   from pipecat.frames.frames import (
       Frame,
       TextFrame,
   )
   print(" Basic frames imported successfully")
except ImportError as e:
   print(f"  Import error: {e}")
   from pipecat.frames.frames import Frame, TextFrame
 
 
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
 
 
from transformers import pipeline as hf_pipeline
import torch

The snippet loads Pipecat core elements (Pipeline, PipelineRunner, PipelineTask, FrameProcessor) and the HuggingFace pipeline for text generation. With these imports ready, the next step is implementing processors that handle user text and display.

SimpleChatProcessor: model-backed response generation

SimpleChatProcessor loads a HuggingFace text-generation model (DialoGPT-small) and maintains a simple conversation history string to provide context across turns. When it receives a TextFrame with user input, it prepares an input string, calls the HuggingFace pipeline, extracts and cleans the model's reply, updates conversation history, and pushes an AI TextFrame downstream.

The exact class implementation used in the tutorial is shown below:

class SimpleChatProcessor(FrameProcessor):
   """Simple conversational AI processor using HuggingFace"""
   def __init__(self):
       super().__init__()
       print(" Loading HuggingFace text generation model...")
       self.chatbot = hf_pipeline(
           "text-generation",
           model="microsoft/DialoGPT-small",
           pad_token_id=50256,
           do_sample=True,
           temperature=0.8,
           max_length=100
       )
       self.conversation_history = ""
       print(" Chat model loaded successfully!")
 
 
   async def process_frame(self, frame: Frame, direction: FrameDirection):
       await super().process_frame(frame, direction)
       if isinstance(frame, TextFrame):
           user_text = getattr(frame, "text", "").strip()
           if user_text and not user_text.startswith("AI:"):
               print(f" USER: {user_text}")
               try:
                   if self.conversation_history:
                       input_text = f"{self.conversation_history} User: {user_text} Bot:"
                   else:
                       input_text = f"User: {user_text} Bot:"
 
 
                   response = self.chatbot(
                       input_text,
                       max_new_tokens=50,
                       num_return_sequences=1,
                       temperature=0.7,
                       do_sample=True,
                       pad_token_id=self.chatbot.tokenizer.eos_token_id
                   )
 
 
                   generated_text = response[0]["generated_text"]
                   if "Bot:" in generated_text:
                       ai_response = generated_text.split("Bot:")[-1].strip()
                       ai_response = ai_response.split("User:")[0].strip()
                       if not ai_response:
                           ai_response = "That's interesting! Tell me more."
                   else:
                       ai_response = "I'd love to hear more about that!"
 
 
                   self.conversation_history = f"{input_text} {ai_response}"
                   await self.push_frame(TextFrame(text=f"AI: {ai_response}"), direction)
               except Exception as e:
                   print(f"  Chat error: {e}")
                   await self.push_frame(
                       TextFrame(text="AI: I'm having trouble processing that. Could you try rephrasing?"),
                       direction
                   )
       else:
           await self.push_frame(frame, direction)

This processor demonstrates a practical—if simple—approach to maintaining context and post-processing model outputs to extract a clean reply.

TextDisplayProcessor and ConversationInputGenerator

The TextDisplayProcessor formats and prints AI responses and keeps a counter for completed exchanges. The ConversationInputGenerator yields a series of demo TextFrame messages with pauses to simulate a user in a demo scenario. Both components are shown below:

class TextDisplayProcessor(FrameProcessor):
   """Displays text frames in a conversational format"""
   def __init__(self):
       super().__init__()
       self.conversation_count = 0
 
 
   async def process_frame(self, frame: Frame, direction: FrameDirection):
       await super().process_frame(frame, direction)
       if isinstance(frame, TextFrame):
           text = getattr(frame, "text", "")
           if text.startswith("AI:"):
               print(f" {text}")
               self.conversation_count += 1
               print(f"     Exchange {self.conversation_count} complete\n")
       await self.push_frame(frame, direction)
 
 
 
class ConversationInputGenerator:
   """Generates demo conversation inputs"""
   def __init__(self):
       self.demo_conversations = [
           "Hello! How are you doing today?",
           "What's your favorite thing to talk about?",
           "Can you tell me something interesting about AI?",
           "What makes conversation enjoyable for you?",
           "Thanks for the great chat!"
       ]
 
 
   async def generate_conversation(self) -> AsyncGenerator[TextFrame, None]:
       print(" Starting conversation simulation...\n")
       for i, user_input in enumerate(self.demo_conversations):
           yield TextFrame(text=user_input)
           if i < len(self.demo_conversations) - 1:
               await asyncio.sleep(2)

These modules separate display logic and input generation from model inference, keeping responsibilities modular and testable.

Assembling the agent and running the demo

SimpleAIAgent composes the chat and display processors with the input generator, builds a Pipeline, and then runs the pipeline asynchronously via PipelineRunner and PipelineTask. The demo producer queues frames and signals completion when done. The exact orchestration is shown here:

class SimpleAIAgent:
   """Simple conversational AI agent using Pipecat"""
   def __init__(self):
       self.chat_processor = SimpleChatProcessor()
       self.display_processor = TextDisplayProcessor()
       self.input_generator = ConversationInputGenerator()
 
 
   def create_pipeline(self) -> Pipeline:
       return Pipeline([self.chat_processor, self.display_processor])
 
 
   async def run_demo(self):
       print(" Simple Pipecat AI Agent Demo")
       print(" Conversational AI with HuggingFace")
       print("=" * 50)
 
 
       pipeline = self.create_pipeline()
       runner = PipelineRunner()
       task = PipelineTask(pipeline)
 
 
       async def produce_frames():
           async for frame in self.input_generator.generate_conversation():
               await task.queue_frame(frame)
           await task.stop_when_done()
 
 
       try:
           print(" Running conversation demo...\n")
           await asyncio.gather(
               runner.run(task),     
               produce_frames(),    
           )
       except Exception as e:
           print(f" Demo error: {e}")
           logging.error(f"Pipeline error: {e}")
 
 
       print(" Demo completed successfully!")

Main entry, environment detection and next steps

The tutorial provides a main coroutine that initializes the agent, runs the demo, prints a progress summary, and detects whether it's running in Google Colab. The final run call awaits main() to start execution. Here is the exact final code block used in the example:

async def main():
   logging.basicConfig(level=logging.INFO)
   print(" Pipecat AI Agent Tutorial")
   print(" Google Colab Compatible")
   print(" Free HuggingFace Models")
   print(" Simple & Working Implementation")
   print("=" * 60)
   try:
       agent = SimpleAIAgent()
       await agent.run_demo()
       print("\n Tutorial Complete!")
       print("\n What You Just Saw:")
       print("✓ Pipecat pipeline architecture in action")
       print("✓ Custom FrameProcessor implementations")
       print("✓ HuggingFace conversational AI integration")
       print("✓ Real-time text processing pipeline")
       print("✓ Modular, extensible design")
       print("\n Next Steps:")
       print("• Add real speech-to-text input")
       print("• Integrate text-to-speech output")
       print("• Connect to better language models")
       print("• Add memory and context management")
       print("• Deploy as a web service")
   except Exception as e:
       print(f" Tutorial failed: {e}")
       import traceback
       traceback.print_exc()
 
 
 
try:
   import google.colab
   print(" Google Colab detected - Ready to run!")
   ENV = "colab"
except ImportError:
   print(" Local environment detected")
   ENV = "local"
 
 
print("\n" + "="*60)
print(" READY TO RUN!")
print("Execute this cell to start the AI conversation demo")
print("="*60)
 
 
print("\n Starting the AI Agent Demo...")
 
 
await main()

Why this approach matters

By splitting concerns—input generation, model inference, and display—this Pipecat-based example shows a clear path for extending the agent: swap or upgrade models, add a speech layer, attach persistent memory, or expose the pipeline as a web service. The pipeline and asynchronous runner provide a flexible backbone to plug in additional components without rewriting the core processing logic.

From Zero to Chatbot: Build a Modular Conversational AI with Pipecat & HuggingFace

Сменить язык