Meta Unveils MobileLLM-R1: Sub‑1B Edge Reasoner That Outperforms Larger Open Models
Meta has published MobileLLM-R1 on Hugging Face, a family of lightweight edge reasoning models sized between 140M and 950M parameters. The series targets efficient mathematical, coding, and scientific reasoning on devices with constrained compute and memory, trading general chat fluency for focused reasoning accuracy and low inference cost.
Architecture and design
The flagship MobileLLM-R1-950M packs several architectural optimizations aimed at reducing compute and memory while preserving representational power:
- 22 Transformer layers with 24 attention heads and 6 grouped KV heads.
- Embedding dimension: 1536; hidden dimension: 6144.
- Grouped-Query Attention (GQA) to lower compute and memory use.
- Block-wise weight sharing to cut parameters without large latency penalties.
- SwiGLU activations to improve expressiveness in smaller models.
- Context length: 4K for base models and 32K for post-trained variants.
- 128K vocabulary with shared input/output embeddings.
These design choices emphasize a compact footprint suitable for edge deployment, minimizing KV-cache and runtime requirements where possible.
Training efficiency
MobileLLM-R1 stands out for data efficiency. The family was trained on roughly 4.2 trillion tokens in total. By comparison, Qwen3’s 0.6B model used about 36 trillion tokens. In other words, MobileLLM-R1 achieves comparable or better reasoning performance using only around 11.7% of the training data that Qwen3 relied on. After base pretraining, Meta applied supervised fine-tuning focused on math, coding, and structured reasoning datasets to sharpen the model’s capabilities in those domains.
This efficiency reduces training cost and resource demands, making the models easier to develop and iterate on for targeted reasoning tasks.
Benchmark performance
On a range of math, reasoning, and coding benchmarks, MobileLLM-R1-950M posts significant gains relative to several fully open alternatives:
- MATH (MATH500): roughly 5× higher accuracy than Olmo-1.24B and about 2× higher than SmolLM2-1.7B.
- Reasoning and coding (GSM8K, AIME, LiveCodeBench): MobileLLM-R1-950M matches or surpasses Qwen3-0.6B despite using far fewer training tokens.
Overall, the R1-950M delivers performance typically associated with larger architectures while maintaining a sub‑billion parameter footprint.
Limitations and trade-offs
MobileLLM-R1 is optimized for structured reasoning, which leads to trade-offs:
- Strengths: math, coding, and formal/scientific reasoning tasks.
- Weaknesses: general conversation, commonsense replies, and creative open-ended tasks perform worse than larger general-purpose LLMs.
- Licensing: distributed under the FAIR NC (non-commercial) license, which restricts production use in commercial settings.
- Long-context modes (32K) increase KV-cache pressure and memory demand during inference, which can limit some edge deployments.
Comparison with Qwen3, SmolLM2, and OLMo
A performance snapshot for post-trained models in key benchmarks (values as reported):
- MobileLLM-R1-950M: 0.949B params, 4.274T train tokens, MATH500 74.0, GSM8K 67.5, AIME'24 15.5, AIME'25 16.3, LiveCodeBench 19.9.
- Qwen3-0.6B: 0.596B params, 36.0T train tokens, MATH500 73.0, GSM8K 79.2, AIME'24 11.3, AIME'25 17.0, LiveCodeBench 14.9.
- SmolLM2-1.7B-Instruct: 1.71B params, ~11.0T tokens, MATH500 19.2, GSM8K 41.8, AIME'24 0.3, AIME'25 0.1, LiveCodeBench 4.4.
- OLMo-2-1B-Instruct: 1.48B params, ~3.95T tokens, MATH500 19.2, GSM8K 69.7, AIME'24 0.6, AIME'25 0.1, LiveCodeBench 0.0.
Key observations:
- R1-950M matches or slightly exceeds Qwen3-0.6B on MATH500 (74.0 vs 73.0) while being trained on roughly 8.6× fewer tokens.
- Gaps versus SmolLM2 and OLMo are substantial across structured reasoning tasks.
- Qwen3 retains an edge on GSM8K, but the margin is small relative to MobileLLM-R1’s training efficiency advantage.
Implications for edge reasoning
MobileLLM-R1 highlights a shift toward smaller, domain-optimized models that prioritize efficiency over scale. For organizations and developers building math- or code-focused assistants on-device, MobileLLM-R1 offers a compelling balance of accuracy and resource usage, provided its licensing and conversational limitations align with project constraints. The models are available on Hugging Face along with related tutorials and resources on GitHub and community channels.