FlexOlmo Revolutionizes Language Model Training Without Data Sharing

Overcoming Data Sharing Limitations in LLM Training

Large-scale language models (LLMs) traditionally require centralized access to massive datasets, many of which are sensitive or legally restricted. This presents a major obstacle for organizations with proprietary or regulated data.

Introducing FlexOlmo: Modular and Decentralized Training

FlexOlmo, developed by researchers at the Allen Institute for AI and partners, offers a modular training and inference framework that respects data governance constraints. It enables training language model components independently on separate private datasets without sharing raw data.

Architecture Based on Mixture-of-Experts (MoE)

FlexOlmo’s design builds on a Mixture-of-Experts model where each expert is a feedforward network trained independently on private data. A fixed public model anchors the system, while attention layers and other parameters remain frozen. Sparse activation ensures only relevant experts are activated per input token.

Key features include:

Expert routing through domain-informed embeddings without joint training.
Bias regularization to balance expert selection.
Independent and asynchronous optimization of each expert module.

FLEXMIX Dataset and Training Setup

The FLEXMIX corpus splits into a public mix and seven closed sets simulating non-shareable domains such as News, Reddit, Code, and Academic Text. Each expert trains on a disjoint subset reflecting real-world data silo scenarios.

Performance and Evaluation

FlexOlmo was tested on 31 benchmarks spanning language understanding, question answering, code generation, and math reasoning. It outperformed baseline methods like model soup and weighted ensembling by a significant margin, especially on domain-specific tasks.

Opt-Out Mechanism and Privacy

A standout feature is the deterministic opt-out, allowing exclusion of any expert’s influence at inference without retraining. Privacy evaluations showed low data extraction risks, and the architecture supports integration with differential privacy techniques.

Scalability and Compatibility

Applied to the OLMo-2 7B baseline pretrained on 4 trillion tokens, FlexOlmo enhanced performance by adding experts for Math and Code without retraining the core model, proving its scalability and ease of integration.

FlexOlmo paves the way for building powerful, privacy-conscious LLMs that comply with data governance policies, making it a breakthrough for organizations needing secure, modular AI solutions.