Building Modular Self-Correcting QA Systems with DSPy and Google's Gemini 1.5

Leveraging DSPy and Gemini for Intelligent QA Systems

This tutorial demonstrates how to develop a modular and self-correcting question-answering (QA) system using the DSPy framework alongside Google's Gemini 1.5 Flash model. The foundation lies in defining structured Signatures to clearly specify input and output behaviors, enabling DSPy to build reliable AI pipelines.

Declarative Composition with DSPy

DSPy's declarative programming approach allows us to create composable modules like AdvancedQA and SimpleRAG. AdvancedQA generates answers with reasoning and verifies factual correctness, implementing self-correction by retrying with refined context when necessary. SimpleRAG simulates retrieval-augmented generation by fetching relevant documents from a knowledge base using a keyword-based retriever.

Defining Signatures for Structured Inputs and Outputs

Two DSPy Signatures are defined: QuestionAnswering, which takes context and question inputs and produces reasoning and an answer; and FactualityCheck, which verifies if the answer is factually accurate given the context.

class QuestionAnswering(dspy.Signature):
    """Answer questions based on given context with reasoning."""
    context: str = dspy.InputField(desc="Relevant context information")
    question: str = dspy.InputField(desc="Question to answer")
    reasoning: str = dspy.OutputField(desc="Step-by-step reasoning")
    answer: str = dspy.OutputField(desc="Final answer")
 
 
class FactualityCheck(dspy.Signature):
    """Verify if an answer is factually correct given context."""
    context: str = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.InputField()
    is_correct: bool = dspy.OutputField(desc="True if answer is factually correct")

Implementing Self-Correction with AdvancedQA

The AdvancedQA module uses a Chain-of-Thought predictor to generate answers with reasoning and a fact-checker to validate them. If the answer is incorrect, it refines the context by including the previous incorrect answer and retries up to a maximum number of attempts.

class AdvancedQA(dspy.Module):
    def __init__(self, max_retries: int = 2):
        super().__init__()
        self.max_retries = max_retries
        self.qa_predictor = dspy.ChainOfThought(QuestionAnswering)
        self.fact_checker = dspy.Predict(FactualityCheck)
       
    def forward(self, context: str, question: str) -> dspy.Prediction:
        prediction = self.qa_predictor(context=context, question=question)
       
        for attempt in range(self.max_retries):
            fact_check = self.fact_checker(
                context=context,
                question=question,
                answer=prediction.answer
            )
           
            if fact_check.is_correct:
                break
               
            refined_context = f"{context}\n\nPrevious incorrect answer: {prediction.answer}\nPlease provide a more accurate answer."
            prediction = self.qa_predictor(context=refined_context, question=question)
       
        return prediction

Simple Retrieval with SimpleRAG

SimpleRAG module retrieves the most relevant documents from a knowledge base by matching keywords. It then passes the retrieved context to AdvancedQA for reasoning and self-correction.

class SimpleRAG(dspy.Module):
    def __init__(self, knowledge_base: List[str]):
        super().__init__()
        self.knowledge_base = knowledge_base
        self.qa_system = AdvancedQA()
       
    def retrieve(self, question: str, top_k: int = 2) -> str:
        # Simple keyword-based retrieval (in practice, use vector embeddings)
        scored_docs = []
        question_words = set(question.lower().split())
       
        for doc in self.knowledge_base:
            doc_words = set(doc.lower().split())
            score = len(question_words.intersection(doc_words))
            scored_docs.append((score, doc))
       
        # Return top-k most relevant documents
        scored_docs.sort(reverse=True)
        return "\n\n".join([doc for _, doc in scored_docs[:top_k]])
   
    def forward(self, question: str) -> dspy.Prediction:
        context = self.retrieve(question)
        return self.qa_system(context=context, question=question)

Training and Optimization

A knowledge base and training examples are prepared to fine-tune the QA system. Using DSPy’s BootstrapFewShot optimizer, the system learns to generate more accurate answers by refining prompts based on examples.

Evaluation and Results

An accuracy metric checks if the predicted answers contain the correct responses. Testing before and after optimization shows improved accuracy, demonstrating DSPy’s ability to optimize QA pipelines effectively.

Summary of Key Concepts

This tutorial highlights important DSPy features: defining Signatures, building modular systems, implementing self-correction, retrieval-augmented generation, prompt optimization with BootstrapFewShot, and performance evaluation — all powered by Google’s Gemini 1.5 Flash API.