Falcon H1R-7B: A Compact Powerhouse in Reasoning

Overview of Falcon H1R-7B

Technology Innovation Institute (TII), Abu Dhabi, has released Falcon-H1R-7B, a 7B parameter reasoning specialized model that matches or exceeds many 14B to 47B reasoning models in math, code, and general benchmarks, all while maintaining compact efficiency. This model builds on Falcon H1 7B Base and is available on Hugging Face under the Falcon-H1R collection.

Architectural Innovations

Falcon-H1R-7B integrates three design choices: a hybrid Transformer architecture with a Mamba2 backbone, support for a 256k token context window, and a training methodology blending supervised long-form reasoning with reinforcement learning via GRPO.

Hybrid Transformer with Mamba2 Architecture

Falcon-H1R-7B leverages a causal decoder model that incorporates Transformer layers and Mamba2 components. The Transformer blocks facilitate standard attention-based reasoning, while Mamba2 enhances linear time sequence modeling and memory management for longer contexts.

Training Protocol for Reasoning Tasks

Two-Stage Training Pipeline

First Stage: Cold start supervised fine-tuning on Falcon-H1-7B Base, mixing long-form reasoning in three domains: mathematics, coding, and science.
Second Stage: Refined with GRPO, rewarding correct reasoning chains with symbolic checks for valid answers in math and execution tests for code.

Performance Benchmarks

Falcon-H1R-7B sets competitive benchmarks across math and coding tasks.

In math, it scored 73.96%, outperforming larger models like Qwen3-32B.
Benchmarks include 88.1% on AIME 24, and 68.6% on LiveCodeBench v6.

General Reasoning Assessment

Achieves 49.48% overall in reasoning tasks, demonstrating that a well-optimized 7B model can rival larger counterparts.

Inference Throughput and Testing Efficiency

With a throughput of approximately 1,000 to 1,800 tokens per second, Falcon-H1R-7B excels in test time scaling through its Deep Think methodology, achieving striking accuracies on various benchmarks, making it highly efficient.

Key Takeaways

Falcon-H1R-7B operates at 7B parameters while supporting a 256k token context.
The two-stage training pipeline enhances capabilities in reasoning tasks.
It demonstrates strong performance in math and coding benchmarks, rivaling models with a much larger parameter count.
Inference throughput is significantly improved thanks to its hybrid architecture.