Alibaba Open-Sources Tongyi DeepResearch: 30B MoE LLM Built for Long-Horizon Web Research

September 18, 2025 · 3 min

What Tongyi DeepResearch is

Alibaba Tongyi Lab has released Tongyi-DeepResearch-30B-A3B, an open-source agent-specialized large language model designed for long-horizon, tool-augmented information-seeking. The model uses a mixture-of-experts (MoE) architecture with about 30.5 billion total parameters and roughly 3.0–3.3 billion active parameters per token. That design aims to keep inference cost close to a small dense model while preserving specialist capacity needed for complex reasoning during multi-turn research workflows.

Benchmarks and reported performance

Tongyi DeepResearch reports leading results on several agentic search and deep-research benchmarks. Notable scores include:

Humanity’s Last Exam (HLE): 32.9
BrowseComp: 43.4 (EN) and 46.7 (ZH)
xbench-DeepSearch: 75

The team also reports strong performance across WebWalkerQA, GAIA, FRAMES, and SimpleQA. According to the release, the model matches or outperforms many proprietary and open-source agents on these agentic, tool-mediated tasks.

Architecture and inference profile

Key architectural and inference details:

MoE routing lineage similar to Qwen3-MoE, with ≈30.5B total parameters and ≈3.0–3.3B active parameters per token.
Large context window: 128K tokens, enabling long, tool-augmented browsing sessions and iterative synthesis across vast documents.
Dual inference modes:
- Native ReAct for evaluating intrinsic reasoning and interactive tool use.
- IterResearch “Heavy” mode for test-time scaling that performs structured multi-round synthesis and reconstruction of context to reduce noise accumulation over long interactions.

This combination targets a practical balance between throughput and specialist capability for deep research tasks.

Training pipeline: synthetic data and on-policy RL

Tongyi DeepResearch is trained end-to-end as an agent rather than only a chat model. The release highlights a fully automated and scalable data engine powering the model:

Agentic continual pre-training (CPT): large-scale synthetic trajectories synthesized from curated corpora, historical tool traces, and graph-structured knowledge to teach retrieval, browsing, and multi-source fusion.
Agentic supervised fine-tuning (SFT) cold-start: trajectories formatted in ReAct and IterResearch styles to instill schema-consistent planning and tool use.
On-policy reinforcement learning: Group Relative Policy Optimization (GRPO) with token-level policy gradients, leave-one-out advantage estimation, and selective negative-sample filtering to stabilize learning in dynamic web environments.

These elements are designed to make the model robust to multi-turn tool interactions and to reduce hallucination and drift during long sessions.

Role in document and web research workflows

Deep-research tasks typically require four capabilities: long-horizon planning, iterative retrieval and verification across multiple sources, evidence tracking with low hallucination, and synthesis under large contexts. Tongyi DeepResearch addresses these with:

A 128K token context window for accumulating and reusing evidence across many steps.
IterResearch rollouts that restructure context each round, keeping only essential artifacts to mitigate context bloat and error propagation.
ReAct training and evaluation that demonstrates the behaviors are learned and not solely engineered via prompting.

The reported benchmark gains suggest improved robustness on multi-hop, tool-mediated queries where previous agents often overfit to prompt patterns or saturate at shallow depths.

Key features summarized

MoE efficiency: ~30.5B total parameters with ~3.0–3.3B active per token, offering a low-cost inference envelope with specialist capacity.
128K context window for extended browsing and synthesis.
Dual inference paradigms: native ReAct and IterResearch Heavy for test-time scaling.
Automated agentic data engine supporting CPT, SFT, and RL.
On-policy RL with GRPO and token-level policy gradients for stability in web settings.
Reported state-of-the-art results on several deep-research suites.

Practical implications and where to find the release

For teams building long-horizon research agents, Tongyi DeepResearch represents a reproducible, open-source stack that trades off inference cost and capability in a practical way. The release includes model weights under an Apache-2.0 license, inference scripts, and evaluation utilities. Models and code are available on platforms like Hugging Face and GitHub with technical details, tutorials, and notebooks to help practitioners reproduce and extend the work.

For researchers focused on multi-turn web exploration, evidence-driven synthesis, and tool-mediated agent workflows, this release is worth investigating as both a benchmark and a baseline for further development.