From Snippets to Reports: Building a Multi‑Round Research Agent with Gemini and DuckDuckGo

August 28, 2025 · 6 min

Overview

This guide walks through a modular deep-research agent designed to run in Google Colab. The system uses Google’s Gemini as the reasoning core, DuckDuckGo’s Instant Answer API for quick web lookups, and orchestrates multi-round querying with deduplication, rate control, and structured prompts. The goal is to minimize API usage while extracting concise, structured insights and producing an automated research report.

Architecture and components

The pipeline is organized into clear components:

Configuration: a dataclass to manage API keys, limits, and delays.
Search layer: a lightweight DuckDuckGo Instant Answer-based search to fetch snippets rapidly.
Extraction and analysis: Gemini-powered prompts to extract key points and synthesize themes and insights.
Orchestration: multi-round querying with deduplication and optional related-query expansion.
Reporting: automated generation of a structured report (executive summary, findings, analysis, conclusions).

Core Python imports

Start by importing standard libraries and the Google Generative AI SDK used to call Gemini. The example below is taken directly from the notebook:

import os
import json
import time
import requests
from typing import List, Dict, Any
from dataclasses import dataclass
import google.generativeai as genai
from urllib.parse import quote_plus
import re

Key classes and methods

The ResearchConfig dataclass encapsulates parameters such as the Gemini API key, source limits, content length caps, and search delay. The DeepResearchSystem class configures Gemini, implements web search using DuckDuckGo Instant Answer API, performs extraction, analyzes aggregated snippets, and constructs a final report.

The complete class and supporting code appear exactly as in the original notebook:

@dataclass
class ResearchConfig:
   gemini_api_key: str
   max_sources: int = 10
   max_content_length: int = 5000
   search_delay: float = 1.0


class DeepResearchSystem:
   def __init__(self, config: ResearchConfig):
       self.config = config
       genai.configure(api_key=config.gemini_api_key)
       self.model = genai.GenerativeModel('gemini-1.5-flash')


   def search_web(self, query: str, num_results: int = 5) -> List[Dict[str, str]]:
       """Search web using DuckDuckGo Instant Answer API"""
       try:
           encoded_query = quote_plus(query)
           url = f"https://api.duckduckgo.com/?q={encoded_query}&format=json&no_redirect=1"


           response = requests.get(url, timeout=10)
           data = response.json()


           results = []


           if 'RelatedTopics' in data:
               for topic in data['RelatedTopics'][:num_results]:
                   if isinstance(topic, dict) and 'Text' in topic:
                       results.append({
                           'title': topic.get('Text', '')[:100] + '...',
                           'url': topic.get('FirstURL', ''),
                           'snippet': topic.get('Text', '')
                       })


           if not results:
               results = [{
                   'title': f"Research on: {query}",
                   'url': f"https://search.example.com/q={encoded_query}",
                   'snippet': f"General information and research about {query}"
               }]


           return results


       except Exception as e:
           print(f"Search error: {e}")
           return [{'title': f"Research: {query}", 'url': '', 'snippet': f"Topic: {query}"}]


   def extract_key_points(self, content: str) -> List[str]:
       """Extract key points using Gemini"""
       prompt = f"""
       Extract 5-7 key points from this content. Be concise and factual:


       {content[:2000]}


       Return as numbered list:
       """


       try:
           response = self.model.generate_content(prompt)
           return [line.strip() for line in response.text.split('\n') if line.strip()]
       except:
           return ["Key information extracted from source"]


   def analyze_sources(self, sources: List[Dict[str, str]], query: str) -> Dict[str, Any]:
       """Analyze sources for relevance and extract insights"""
       analysis = {
           'total_sources': len(sources),
           'key_themes': [],
           'insights': [],
           'confidence_score': 0.7
       }


       all_content = " ".join([s.get('snippet', '') for s in sources])


       if len(all_content) > 100:
           prompt = f"""
           Analyze this research content for the query: "{query}"


           Content: {all_content[:1500]}


           Provide:
           1. 3-4 key themes (one line each)
           2. 3-4 main insights (one line each)
           3. Overall confidence (0.1-1.0)


           Format as JSON with keys: themes, insights, confidence
           """


           try:
               response = self.model.generate_content(prompt)
               text = response.text
               if 'themes' in text.lower():
                   analysis['key_themes'] = ["Theme extracted from analysis"]
                   analysis['insights'] = ["Insight derived from sources"]
           except:
               pass


       return analysis


   def generate_comprehensive_report(self, query: str, sources: List[Dict[str, str]],
                                   analysis: Dict[str, Any]) -> str:
       """Generate final research report"""


       sources_text = "\n".join([f"- {s['title']}: {s['snippet'][:200]}"
                                for s in sources[:5]])


       prompt = f"""
       Create a comprehensive research report on: "{query}"


       Based on these sources:
       {sources_text}


       Analysis summary:
       - Total sources: {analysis['total_sources']}
       - Confidence: {analysis['confidence_score']}


       Structure the report with:
       1. Executive Summary (2-3 sentences)
       2. Key Findings (3-5 bullet points)
       3. Detailed Analysis (2-3 paragraphs)
       4. Conclusions & Implications (1-2 paragraphs)
       5. Research Limitations


       Be factual, well-structured, and insightful.
       """


       try:
           response = self.model.generate_content(prompt)
           return response.text
       except Exception as e:
           return f"""
# Research Report: {query}


## Executive Summary
Research conducted on "{query}" using {analysis['total_sources']} sources.


## Key Findings
- Multiple perspectives analyzed
- Comprehensive information gathered
- Research completed successfully


## Analysis
The research process involved systematic collection and analysis of information related to {query}. Various sources were consulted to provide a balanced perspective.


## Conclusions
The research provides a foundation for understanding {query} based on available information.


## Research Limitations
Limited by API constraints and source availability.
           """


   def conduct_research(self, query: str, depth: str = "standard") -> Dict[str, Any]:
       """Main research orchestration method"""
       print(f" Starting research on: {query}")


       search_rounds = {"basic": 1, "standard": 2, "deep": 3}.get(depth, 2)
       sources_per_round = {"basic": 3, "standard": 5, "deep": 7}.get(depth, 5)


       all_sources = []


       search_queries = [query]


       if depth in ["standard", "deep"]:
           try:
               related_prompt = f"Generate 2 related search queries for: {query}. One line each."
               response = self.model.generate_content(related_prompt)
               additional_queries = [q.strip() for q in response.text.split('\n') if q.strip()][:2]
               search_queries.extend(additional_queries)
           except:
               pass


       for i, search_query in enumerate(search_queries[:search_rounds]):
           print(f" Search round {i+1}: {search_query}")
           sources = self.search_web(search_query, sources_per_round)
           all_sources.extend(sources)
           time.sleep(self.config.search_delay)


       unique_sources = []
       seen_urls = set()
       for source in all_sources:
           if source['url'] not in seen_urls:
               unique_sources.append(source)
               seen_urls.add(source['url'])


       print(f" Analyzing {len(unique_sources)} unique sources...")


       analysis = self.analyze_sources(unique_sources[:self.config.max_sources], query)


       print(" Generating comprehensive report...")


       report = self.generate_comprehensive_report(query, unique_sources, analysis)


       return {
           'query': query,
           'sources_found': len(unique_sources),
           'analysis': analysis,
           'report': report,
           'sources': unique_sources[:10]
       }

Quick setup for Google Colab

A helper function wraps configuration and returns a ready-to-use DeepResearchSystem instance with more aggressive limits suitable for Colab experimentation:

def setup_research_system(api_key: str) -> DeepResearchSystem:
   """Quick setup for Google Colab"""
   config = ResearchConfig(
       gemini_api_key=api_key,
       max_sources=15,
       max_content_length=6000,
       search_delay=0.5
   )
   return DeepResearchSystem(config)

Usage example and main block

The notebook includes an example main block showing initialization and running a sample query. The example prints a summary, the generated report, and a list of consulted sources:

if __name__ == "__main__":
   API_KEY = "Use Your Own API Key Here"


   researcher = setup_research_system(API_KEY)


   query = "Deep Research Agent Architecture"
   results = researcher.conduct_research(query, depth="standard")


   print("="*50)
   print("RESEARCH RESULTS")
   print("="*50)
   print(f"Query: {results['query']}")
   print(f"Sources found: {results['sources_found']}")
   print(f"Confidence: {results['analysis']['confidence_score']}")
   print("\n" + "="*50)
   print("COMPREHENSIVE REPORT")
   print("="*50)
   print(results['report'])


   print("\n" + "="*50)
   print("SOURCES CONSULTED")
   print("="*50)
   for i, source in enumerate(results['sources'][:5], 1):
       print(f"{i}. {source['title']}")
       print(f"   URL: {source['url']}")
       print(f"   Preview: {source['snippet'][:150]}...")
       print()

Behavior highlights and customization

Multi-round search: the system can run 1–3 rounds depending on depth, and expands queries by asking Gemini for related searches.
Deduplication: URL deduplication avoids redundant sources.
Rate control: configurable search_delay reduces API pressure and imitates polite scraping behavior.
Prompt structure: prompts limit content length and request structured outputs (numbered lists, JSON) to simplify downstream parsing.
Extensibility: you can add custom ranking, domain-specific filters, or swap the search layer for more robust web scraping.

Practical notes

DuckDuckGo Instant Answer is lightweight but may return limited structured results; consider using dedicated search APIs for deeper coverage.
Gemini is used for concise extraction and synthesis — tune prompts and model choice based on your needs and API costs.
Keep an eye on API quotas and design the search rounds and max_sources accordingly to balance depth and cost.

This notebook demonstrates a compact and practical pattern to convert unstructured snippets into a structured research report by combining search, language modeling, and simple orchestration logic in Colab.