DEER: A Training-Free Method Enabling Dynamic Early Exit in Large Reasoning Language Models

Advances in Large Reasoning Language Models

Recent progress in large reasoning language models (LRLMs) like DeepSeek-R1 and GPT-O1 has enhanced complex problem-solving by extending Chain-of-Thought (CoT) generation during inference. These models leverage test-time scaling laws to explore richer, more diverse reasoning paths. However, generating very long CoT sequences results in computational inefficiency and increased latency, complicating real-world deployment. Moreover, excessive reasoning can introduce redundant or irrelevant steps, leading to errors and reduced accuracy.

The Overthinking Challenge

Traditional supervised fine-tuning and reinforcement learning approaches lack dynamic control over reasoning length, often causing models to overthink. Research suggests that reasoning can often stop earlier at so-called “pearl reasoning” points without losing correctness. Identifying these critical stopping points can significantly improve efficiency while maintaining or improving model performance.

Existing Methods and Their Limitations

Approaches to enhance inference efficiency fall into three categories: post-training, prompt-based, and output-based methods. Post-training requires retraining with variable-length CoT examples or length rewards but is computationally expensive and risks overfitting. Prompt-based methods adjust input prompts to control CoT length, balancing conciseness and accuracy. Output-based methods use sampling techniques such as early stopping when multiple outputs converge but rely heavily on best-of-N sampling, which newer models have moved away from. Some early exit strategies need additional verification models or have limited applicability.

Introducing DEER: Dynamic Early Exit in Reasoning

A team from the Institute of Information Engineering, University of Chinese Academy of Sciences, and Huawei Technologies proposed DEER, a simple, training-free method that enables LRLMs to dynamically exit reasoning early. DEER monitors key transition points, like the generation of “Wait” tokens, prompting the model to produce trial answers. If confidence in a trial answer is high, the model halts reasoning; otherwise, it continues.

This approach integrates smoothly with existing models such as DeepSeek, reducing CoT length by 31–43% and improving accuracy by 1.7–5.7% across benchmarks including MATH-500, AIME 2024, and GPQA Diamond.

How DEER Works

DEER’s architecture consists of three modules:

Reasoning Transition Monitor: Detects "thought switch" signals indicating critical reasoning junctures.
Answer Inducer: Prompts the model to generate a trial conclusion.
Confidence Evaluator: Assesses if the confidence in the trial answer exceeds a threshold.

If confidence is sufficient, reasoning stops early; otherwise, it proceeds.

To minimize latency caused by trial answer generation, DEER uses branch-parallel decoding with dynamic cache management, boosting efficiency without compromising accuracy, especially in code generation tasks.

Experimental Results

The method was tested on four reasoning benchmarks—MATH-500, AMC 2023, AIME 2024, GPQA Diamond—and programming benchmarks HumanEval and BigCodeBench. Evaluations used DeepSeek-R1-Distill-Qwen models ranging from 1.5B to 32B parameters in a zero-shot Chain-of-Thought setup.

DEER reduced reasoning length by 31–43% while increasing accuracy by 1.7–5.7% compared to standard CoT. It proved particularly effective for smaller models and simpler tasks by correcting more responses during early exits. On programming benchmarks, DEER shortened reasoning length by over 60%, maintaining accuracy with little to no loss.

Balancing Efficiency and Accuracy

This study confirms that early exit during CoT generation is feasible and beneficial. DEER’s training-free dynamic early exit strategy allows models to halt reasoning once sufficient information is gathered, balancing efficiency and performance better than traditional long CoT methods. By dynamically monitoring model confidence, DEER avoids unnecessary reasoning steps, leading to faster and more accurate model outputs across diverse tasks.