Baidu Unveils ERNIE-4.5-21B-A3B-Thinking: Compact MoE with 128K Context for Deep Reasoning

ERNIE-4.5-21B-A3B-Thinking is Baidu AI Research’s new reasoning-focused large language model that aims to balance deep reasoning capacity with deployment efficiency. The model belongs to the ERNIE-4.5 family and uses a Mixture-of-Experts backbone to concentrate compute where it matters most.

Design and architecture

The model is a sparse Mixture-of-Experts (MoE) with 21 billion total parameters but only about 3 billion active parameters per token. A learned router selects a subset of experts for each token, reducing computation while enabling expert specialization. To improve expert utilization and training stability, the team applies router orthogonalization loss and a token-balanced loss that encourages diverse expert activation.

This architecture positions ERNIE-4.5-21B-A3B-Thinking between small dense models and ultra-large dense systems, reflecting the research hypothesis that roughly 3B active parameters per token can be a practical sweet spot for reasoning performance versus deployment cost.

Long-context reasoning

A standout capability is native support for a 128K context length. That makes it suitable for processing long documents, multi-file codebases, or extended chains of thought where maintaining context across many steps is critical.

To enable this, the training pipeline progressively scales Rotary Position Embeddings (RoPE) by increasing the frequency base from 10K up to 500K. Complementary optimizations such as FlashMask attention and memory-efficient scheduling help keep long-context operations feasible in practice.

Training recipe and reasoning focus

The model follows the multi-stage ERNIE-4.5 training recipe with a text-first approach:

After pretraining, the team applies supervised fine-tuning (SFT) across disciplines like mathematics, logic, coding, and science, followed by Progressive Reinforcement Learning (PRL). Reinforcement stages start with logic, then expand into mathematics and programming, and finally broader reasoning tasks. Unified Preference Optimization (UPO) combines preference learning with PPO to stabilize alignment and reduce reward hacking.

Tool integration and function calling

ERNIE-4.5-21B-A3B-Thinking supports structured tool and function calling, and is designed for integration with runtimes and libraries such as vLLM, Transformers 4.54+, and FastDeploy. Built-in function calling lets the model reason over long contexts while invoking external APIs or computation, which is essential for program synthesis, symbolic reasoning, and multi-agent workflows.

Benchmark performance and comparisons

On reasoning benchmarks the model shows notable gains across logical reasoning, mathematics, scientific QA, and programming tasks. It demonstrates improved accuracy on multi-step reasoning datasets and competitive performance with larger dense models on STEM tasks. The results indicate that sparse MoE routing can amplify reasoning specialization without requiring trillion-scale dense parameters.

Compared with other reasoning-focused LLMs like OpenAI o3, Anthropic Claude 4, DeepSeek-R1, and Qwen-3, ERNIE-4.5-21B-A3B-Thinking offers an alternative trade-off: sparse activation for compute efficiency, native long-context training to 128K, and an Apache-2.0 license that eases commercial adoption.

Availability

The model is released under the Apache-2.0 license and is available on Hugging Face for research and commercial use: https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking

For more details and experiments, see the project page and paper linked on the model hub.