MiniMax AI Launches MiniMax-M1: A 456B Parameter Hybrid AI Model for Extended Context and Reinforcement Learning
MiniMax AI has unveiled MiniMax-M1, a 456B parameter hybrid model optimized for long-context processing and reinforcement learning, offering significant improvements in scalability and efficiency.
Tackling Long-Context Reasoning Challenges
Large AI reasoning models are designed not only to understand language but also to perform multi-step reasoning that requires maintaining attention over long sequences and comprehending complex contexts. As AI demands grow in practical and software development domains, researchers have pursued architectures capable of processing extended input lengths while sustaining coherent reasoning chains without incurring prohibitive computational costs.
Computational Limitations of Traditional Transformer Models
Traditional transformer models use a softmax attention mechanism that scales quadratically with input length, making it computationally expensive to handle very long sequences. This limits their effectiveness in real-time or cost-sensitive environments where long context and efficient inference are necessary.
Existing Approaches and Their Drawbacks
Various methods have been explored to alleviate these constraints, including sparse and linear attention variants, state-space models, and recurrent networks. However, these approaches often suffer from complexity or scalability issues, limiting their adoption in top-tier reasoning models. Additionally, some advanced models like Tencent’s Hunyuan-T1 remain closed-source, restricting broader research and validation.
Introducing MiniMax-M1: An Open-Weight, Scalable Model
MiniMax AI developed MiniMax-M1, a large-scale reasoning model with 456 billion parameters, activating 45.9 billion per token. It supports context lengths up to one million tokens—eight times more than DeepSeek R1—and requires only 25% of DeepSeek R1’s computational operations at 100,000 token generation length. Trained on extensive reinforcement learning tasks spanning mathematics, coding, and software engineering, MiniMax-M1 advances practical long-context AI.
Hybrid Attention Architecture: Combining Lightning and Softmax Attention
MiniMax-M1 employs a hybrid attention approach: every seventh transformer block uses traditional softmax attention, while the other six utilize lightning attention. Lightning attention is an I/O-aware adaptation of linear attention, significantly reducing computational demands while maintaining performance for very long contexts. This design enables efficient scaling to hundreds of thousands of tokens.
CISPO Algorithm Enhances Reinforcement Learning Training
The CISPO algorithm, introduced by the researchers, stabilizes training by clipping importance sampling weights rather than token updates. This innovation allows stable and consistent training even with off-policy updates. In benchmarks, CISPO doubled training speed compared to DAPO. The full reinforcement learning cycle for MiniMax-M1 was completed in three weeks using 512 H800 GPUs, costing approximately $534,700.
Training Dataset and Performance Benchmarks
MiniMax-M1 was trained on 41 logic tasks generated by the SynLogic framework and real-world software engineering scenarios from the SWE bench, utilizing execution-based rewards to enhance coding task performance. In benchmarks, MiniMax-M1 outperformed DeepSeek-R1 and Qwen3-235B in software engineering, long-context processing, and agent tool use. Although it lagged slightly behind DeepSeek-R1-0528 in math and coding contests, it surpassed OpenAI o3 and Claude 4 Opus on long-context understanding and outperformed Gemini 2.5 Pro on the TAU-Bench agent tool evaluation.
A Transparent and Scalable Step Forward
MiniMax-M1 sets a new standard by combining transparency with scalability, addressing both inference efficiency and training stability challenges. This open-weight model offers practical solutions for integrating large-scale reasoning models into real-world applications, paving the way for future advancements in AI long-context reasoning.
For more details, explore the [Paper, Model, and GitHub Page]. All credit goes to the MiniMax AI research team. Follow their updates on Twitter, join their 100k+ ML SubReddit, and subscribe to their newsletter.
Сменить язык
Читать эту статью на русском