AI2 Unveils Olmo 3: Open 7B and 32B LLMs with 65K Context and Full Data Transparency

What is Olmo 3?

Olmo 3 is a fully open family of dense transformer models released by the Allen Institute for AI (AI2). The suite includes 7B and 32B parameter variants, each offering a 65,536 token context window and a transparent, end-to-end model flow. AI2 publishes the data recipes, intermediate checkpoints, training code, post-training pipelines, and evaluation tooling to enable reproducible research and fine-grained inspection.

Dolma 3 data suite and staged curriculum

At the heart of Olmo 3 is the Dolma 3 data collection and a staged training curriculum. Dolma 3 Mix is a 5.9T token pretraining pool with web text, scientific PDFs, code repositories, and other natural data. From that pool AI2 builds two higher-quality subsets: Dolma 3 Dolmino Mix and Dolma 3 Longmino Mix.

The training schedule uses Dolma 3 Mix for the main pretraining of Olmo 3-Base. A 100B token Dolmino mid-training set focuses on math, code, instruction following, reading comprehension, and thinking-oriented tasks. The Longmino extension adds 50B tokens for the 7B model and 100B tokens for the 32B model, prioritizing long documents and scientific PDFs processed via olmOCR. This staged curriculum is what enables stable training at the 65K token context length.

Large-scale training infrastructure

Olmo 3-Base 7B was trained on Dolma 3 Mix using 1,024 H100 GPUs, achieving around 7,700 tokens per device per second. Subsequent mid and long-context stages used smaller clusters: 128 H100s for Dolmino mid training and 256 H100s for the Longmino long-context extension.

Model family and variants

The Olmo 3 family consists of several variants built on the same base models and staged recipe:

Olmo 3-Base 7B and 32B: Dense foundation models trained through the Dolma 3 curriculum with long context support.
Olmo 3-Think 7B and 32B: Reasoning-focused models that apply a three-stage post-training pipeline including supervised fine-tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning with Verifiable Rewards (RLVR) within the OlmoRL framework.
Olmo 3-Instruct 7B: Tuned for instruction following, multi-turn chat, and tool use using a Dolci Instruct SFT/DPO/RL pipeline.
Olmo 3-RL Zero 7B: A clean RL pathway for researchers that separates pretraining data from RL data and uses Dolci RL Zero datasets decontaminated from Dolma 3.

Performance and comparisons

AI2 positions Olmo 3-Base 32B as a top fully open base model at its scale, reporting competitive performance versus open-weight families such as Qwen 2.5 and Gemma 3. Olmo 3-Think 32B is described as a leading open reasoning model, narrowing the gap to Qwen 3 32B thinking models while training on roughly six times fewer tokens. Olmo 3-Instruct 7B is reported to match or outperform several open competitors on instruction and conversational benchmarks.

Openness, reproducibility, and tooling

A distinguishing feature of Olmo 3 is the explicit operationalization of openness. AI2 publishes the Dolma 3 construction recipes, staged pretraining and post-training configurations, released checkpoints, evaluation suites (like OLMES and OlmoBaseEval), and tooling. This end-to-end transparency aims to reduce ambiguity about data quality, long-context training, and reasoning-focused RL workflows, creating a rigorous baseline for follow-up research and controlled experiments.

Why it matters

Olmo 3 provides researchers and practitioners with open, reproducible LLM building blocks that span from raw data to RL-ready variants. The combination of long-context capability, reasoning-focused variants, and a fully documented pipeline lowers the barrier for exploring chain-of-thought research, long-document understanding, and RL on language models while offering a clear reference point for comparing open-weight models.

AI2 Unveils Olmo 3: Open 7B and 32B LLMs with 65K Context and Full Data Transparency

Сменить язык