Fractional Reasoning: Enhancing LLM Inference with Adaptive Depth Control

Limitations of Current Test-Time Compute Strategies

Large Language Models (LLMs) have improved across many tasks, often leveraging additional computational resources at test time to enhance reasoning. Common strategies include generating multiple candidate answers or iteratively refining responses via self-reflection. However, these methods apply a uniform reasoning depth to all queries, ignoring the variability in problem complexity. This can result in under- or overthinking, leading to suboptimal answers or wasted compute.

Introducing Fractional Reasoning for Dynamic Control

Stanford researchers have proposed Fractional Reasoning (FR), a novel, training-free, and model-agnostic framework that enables dynamic adjustment of reasoning depth during inference. FR works by manipulating the model's internal latent states. It extracts the latent shift caused by reasoning prompts like Chain-of-Thought or reflection inputs and reapplies this shift scaled by a tunable parameter. This process allows the model to flexibly control reasoning depth without changing the input or needing fine-tuning.

Enhancing Breadth and Depth of Reasoning

FR enhances two primary types of test-time scaling: breadth-based methods such as Best-of-N and Majority Vote, and depth-based methods like self-reflection. By tuning the scaling factor, FR broadens the exploration of potential solutions and refines reasoning steps, leading to more efficient and accurate inference.

Benchmark Performance and Model Versatility

Evaluations on multi-step reasoning benchmarks GSM8K, MATH500, and GPQA demonstrate that FR consistently outperforms standard test-time compute approaches. Experiments with instruction-tuned open-source models Qwen2.5-7B-Instruct and LLaMA-3.1-8B-Instruct show significant accuracy gains. FR also proves effective on specialized reasoning models like DeepSeek-R1-Distill-Qwen-7B, confirming its general applicability.

Behavioral Insights and Scaling Effects

Analysis shows that increasing the scaling factor in FR leads to longer, more detailed multi-step reasoning outputs. The framework reliably steers model behavior in a predictable and continuous manner. Performance improvements scale with the number of generations, consistently exceeding majority vote baselines across various sampling budgets.

Future Directions

While FR represents a major step toward adaptive and efficient LLM inference, current limitations include reliance on predefined reasoning prompts and manual tuning of scaling factors. Future work aims to develop automatic policies for dynamic inference depth selection, enabling fully autonomous adaptive reasoning in LLMs.

For full details, see the original research paper by Stanford University.