Agentic-R1: The AI Revolutionizing Math Problem Solving by Merging Language and Tool Use

Challenges in Mathematical Reasoning Models

Recent long-chain-of-thought (long-CoT) reasoning models have pushed the boundaries of mathematical problem solving by generating detailed reasoning paths that include iterative self-verification and refinement. However, these open-source models rely solely on natural language reasoning, which makes computations costly and vulnerable to errors without explicit verification.

DualDistill Framework and Agentic-R1 Model

Researchers at Carnegie Mellon University introduced DualDistill, a novel distillation framework that integrates learning from two distinct teacher models: one specializing in natural language reasoning and the other augmented with tool usage capabilities. This framework produces Agentic-R1, a versatile student model that dynamically chooses between natural language reasoning and code execution based on the problem's nature.

Agentic-R1 executes code to handle arithmetic and algorithmic challenges efficiently, while it uses natural language reasoning for abstract and conceptual problems. DualDistill performs trajectory composition to merge knowledge from both teacher models and applies self-distillation to refine the student model further. The OpenHands framework serves as the agentic reasoning teacher, and DeepSeek-R1 handles text-based reasoning.

Performance Evaluation

Agentic-R1 was benchmarked on tests including DeepMath-L and Combinatorics300, comparing its performance to baseline models such as DeepSeek-R1-Distill and Qwen-2.5-Instruct. The results demonstrate that Agentic-R1 surpasses models specializing exclusively in either tool-assisted or pure reasoning strategies. It intelligently balances using reasoning or tool execution, achieving higher accuracy and computational efficiency.

Intelligent Tool Usage

Qualitative analyses reveal that Agentic-R1 exhibits smart tool activation patterns. It employs code execution tools in 79.2% of the computationally intensive Combinatorics300 problems, while reducing tool usage to 52.0% for simpler datasets like AMC. This adaptive behavior emerges from supervised fine-tuning without explicit instructions, optimizing the trade-off between computational cost and accuracy.

Robustness Against Imperfect Teachers

Even when the agentic teacher model's accuracy was limited (48.4% on Combinatorics300), Agentic-R1 improved performance from 44.7% to 50.9%, outperforming its teacher. This robustness highlights the effectiveness of DualDistill in leveraging imperfect guidance to produce better results.

Implications for AI Reasoning

DualDistill and Agentic-R1 demonstrate a promising approach to building AI agents capable of integrating heterogeneous problem-solving methods. By combining natural language reasoning and tool-assisted computation, these models offer more reliable, efficient, and adaptable solutions to complex mathematical problems.

For more details, refer to the research paper and the project GitHub page.