WebThinker: Empowering Large Reasoning Models for Autonomous Web Search and Scientific Reporting

Limitations of Large Reasoning Models in Complex Research

Large reasoning models (LRMs) have demonstrated impressive skills in various areas like mathematics, coding, and scientific reasoning. Despite these strengths, they face considerable challenges when tasked with complex information retrieval from the web and generating accurate scientific reports through multi-step reasoning. Their reliance on internal knowledge limits their ability to explore external web information thoroughly.

The Need for Deep Integration of Web Exploration and Reasoning

Current open-source deep search agents primarily use Retrieval-Augmented Generation (RAG) techniques with fixed workflows, which restricts LRMs from fully leveraging web data during research. Enhancing LRMs with dynamic web exploration capabilities is essential to overcome these limitations.

Introducing WebThinker: A Novel Deep Research Agent

Researchers from Renmin University of China, BAAI, and Huawei Poisson Lab have developed WebThinker, a deep research agent designed to empower LRMs to autonomously search the web, navigate web pages, and draft research reports in real time. WebThinker features a Deep Web Explorer module that allows LRMs to dynamically identify knowledge gaps and retrieve relevant information.

Autonomous Think-Search-and-Draft Strategy

WebThinker employs an innovative strategy combining reasoning, information gathering, and report drafting seamlessly. This approach is supported by reinforcement learning (RL)-based training, enhancing the efficient use of research tools through iterative online Direct Preference Optimization.

Dual Modes: Problem-Solving and Report Generation

The framework operates in two modes:

Problem-Solving Mode: Utilizes the Deep Web Explorer for complex task resolution by invoking web search during reasoning.
Report Generation Mode: Enables LRMs to autonomously create detailed scientific reports, assisted by an additional language model to handle report-writing tools.

Extensive Evaluation and Performance Gains

WebThinker was tested on numerous complex reasoning and report generation datasets, such as SuperGPQA, WebWalkerQA, OpenThoughts, and more. The WebThinker-32B-Base model outperformed previous methods, achieving significant improvements like 22.9% on WebWalkerQA and 20.4% on HLE benchmarks. It also surpassed other advanced systems in scientific report generation, confirming its superior capability and adaptability across different LRM architectures.

Future Directions

The research team plans to expand WebThinker’s capabilities by integrating multimodal reasoning, exploring advanced tool learning techniques, and developing GUI-based web exploration features.

This advancement marks a significant step towards more powerful intelligent systems capable of handling complex, knowledge-intensive real-world challenges.