Mastering LangGraph: Build a Dynamic Text Analysis Pipeline with AI

Introduction to LangGraph

LangGraph is a robust framework from LangChain designed to create stateful, multi-actor applications using large language models (LLMs). It enables developers to architect sophisticated AI agents by structuring workflows as graphs, similar to blueprints an architect uses to design a building. This graph-based approach allows seamless connection and coordination of various AI capabilities.

Key Features

State Management: Maintain persistent information throughout interactions.
Flexible Routing: Define complex flows between components.
Persistence: Save and resume workflows as needed.
Visualization: Visualize the agent's architecture for better understanding.

Setting Up the Environment

To get started, install the necessary packages:

!pip install langgraph langchain langchain-openai python-dotenv

Obtain and configure your OpenAI API key:

import os
from dotenv import load_dotenv
 
load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

Test your setup by invoking a simple prompt:

from langchain_openai import ChatOpenAI
 
llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke("Hello! Are you working?")
print(response.content)

Building the Text Analysis Pipeline

This tutorial demonstrates a pipeline with three stages: text classification, entity extraction, and text summarization.

Defining Agent Memory

The agent's state keeps track of the text and analysis results:

from typing import TypedDict, List
 
class State(TypedDict):
    text: str
    classification: str
    entities: List[str]
    summary: str

Implementing Core Capabilities

Each capability is a function that processes the state:

1. Classification Node

def classification_node(state: State):
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Classify the following text into one of the categories: News, Blog, Research, or Other.\n\nText:{text}\n\nCategory:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    classification = llm.invoke([message]).content.strip()
    return {"classification": classification}

2. Entity Extraction Node

def entity_extraction_node(state: State):
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Extract all the entities (Person, Organization, Location) from the following text. Provide the result as a comma-separated list.\n\nText:{text}\n\nEntities:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    entities = llm.invoke([message]).content.strip().split(", ")
    return {"entities": entities}

3. Summarization Node

def summarization_node(state: State):
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Summarize the following text in one short sentence.\n\nText:{text}\n\nSummary:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    summary = llm.invoke([message]).content.strip()
    return {"summary": summary}

Constructing the Workflow

Connect the nodes in sequence:

workflow = StateGraph(State)
workflow.add_node("classification_node", classification_node)
workflow.add_node("entity_extraction", entity_extraction_node)
workflow.add_node("summarization", summarization_node)
workflow.set_entry_point("classification_node")
workflow.add_edge("classification_node", "entity_extraction")
workflow.add_edge("entity_extraction", "summarization")
workflow.add_edge("summarization", END)
app = workflow.compile()

Testing the Pipeline

Analyze sample text:

sample_text = """ OpenAI has announced the GPT-4 model, which is a large multimodal model that exhibits human-level performance on various professional benchmarks. It is developed to improve the alignment and safety of AI systems. Additionally, the model is designed to be more efficient and scalable than its predecessor, GPT-3. The GPT-4 model is expected to be released in the coming months and will be available to the public for research and development purposes. """
state_input = {"text": sample_text}
result = app.invoke(state_input)
print("Classification:", result["classification"])
print("\nEntities:", result["entities"])
print("\nSummary:", result["summary"])

Extending the Pipeline with Sentiment Analysis

Add sentiment analysis by expanding the state and including a new node:

class EnhancedState(TypedDict):
    text: str
    classification: str
    entities: List[str]
    summary: str
    sentiment: str
 
def sentiment_node(state: EnhancedState):
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Analyze the sentiment of the following text. Is it Positive, Negative, or Neutral?\n\nText:{text}\n\nSentiment:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    sentiment = llm.invoke([message]).content.strip()
    return {"sentiment": sentiment}
 
enhanced_workflow = StateGraph(EnhancedState)
enhanced_workflow.add_node("classification_node", classification_node)
enhanced_workflow.add_node("entity_extraction", entity_extraction_node)
enhanced_workflow.add_node("summarization", summarization_node)
enhanced_workflow.add_node("sentiment_analysis", sentiment_node)
enhanced_workflow.set_entry_point("classification_node")
enhanced_workflow.add_edge("classification_node", "entity_extraction")
enhanced_workflow.add_edge("entity_extraction", "summarization")
enhanced_workflow.add_edge("summarization", "sentiment_analysis")
enhanced_workflow.add_edge("sentiment_analysis", END)
enhanced_app = enhanced_workflow.compile()
 
enhanced_result = enhanced_app.invoke({"text": sample_text})
print("Classification:", enhanced_result["classification"])
print("\nEntities:", enhanced_result["entities"])
print("\nSummary:", enhanced_result["summary"])
print("\nSentiment:", enhanced_result["sentiment"])

Using Conditional Edges for Dynamic Routing

LangGraph supports conditional routing to execute nodes based on the data state.

def route_after_classification(state: EnhancedState) -> str:
    category = state["classification"].lower()
    return category in ["news", "research"]
 
conditional_workflow = StateGraph(EnhancedState)
conditional_workflow.add_node("classification_node", classification_node)
conditional_workflow.add_node("entity_extraction", entity_extraction_node)
conditional_workflow.add_node("summarization", summarization_node)
conditional_workflow.add_node("sentiment_analysis", sentiment_node)
conditional_workflow.set_entry_point("classification_node")
conditional_workflow.add_conditional_edges("classification_node", route_after_classification, path_map={
    True: "entity_extraction",
    False: "summarization"
})
conditional_workflow.add_edge("entity_extraction", "summarization")
conditional_workflow.add_edge("summarization", "sentiment_analysis")
conditional_workflow.add_edge("sentiment_analysis", END)
conditional_app = conditional_workflow.compile()
 
# Test with news text
news_text = """
OpenAI released the GPT-4 model with enhanced performance on academic and professional tasks. It's seen as a major breakthrough in alignment and reasoning capabilities.
"""
result = conditional_app.invoke({"text": news_text})
print("Classification:", result["classification"])
print("Entities:", result.get("entities", "Skipped"))
print("Summary:", result["summary"])
print("Sentiment:", result["sentiment"])
 
# Test with blog text
blog_text = """
Here's what I learned from a week of meditating in silence. No phones, no talking—just me, my breath, and some deep realizations.
"""
result = conditional_app.invoke({"text": blog_text})
print("Classification:", result["classification"])
print("Entities:", result.get("entities", "Skipped (not applicable)"))
print("Summary:", result["summary"])
print("Sentiment:", result["sentiment"])

This conditional logic allows the agent to skip unnecessary steps, improving efficiency and cost-effectiveness while adapting to the input context.

LangGraph's graph-based approach provides a flexible, modular, and extensible way to build intelligent AI agents for complex text processing tasks.