OpenZL: Meta AI's Self-Describing, Format-Aware Compression with a Universal Decoder

Compression as a computational graph

OpenZL reframes compression as a directed acyclic graph of modular codecs and typed message streams. Instead of shipping a fixed reader for every new compressor, OpenZL serializes a compressor’s DAG along with the compressed payload. Any frame produced by an OpenZL compressor includes a self-describing graph that a universal decoder can execute, allowing compressor innovation without coordinated reader rollouts.

How it works

Developers provide a high-level data description and OpenZL composes parsing, grouping, transformation, and entropy stages into a tailored DAG. The finalized graph and the compressed bytes form a single self-describing frame. During decoding, the universal decoder procedurally follows the embedded graph specification and reconstructs data without requiring new reader binaries.

Tooling and APIs

OpenZL includes SDDL, the Simple Data Description Language, which exposes built-in components and APIs for decomposing inputs into typed streams based on a precompiled data description. SDDL is available in C and Python under openzl.ext.graphs.SDDL. The core library and language bindings are open-sourced; the repository documents C, C++ and Python usage, and community bindings such as Rust openzl-sys are already appearing.

Performance and real deployments

Meta’s team reports that OpenZL yields Pareto improvements in compression ratio and throughput compared with general-purpose codecs across varied real-world datasets. Results depend on data characteristics and pipeline configuration rather than a single universal multiplier. Internal deployments at Meta show consistent gains in size and speed as well as faster compressor development cycles.

Why this matters

OpenZL combines the efficiency of domain-specific, format-aware codecs with the operational simplicity of a single, stable decoder. By embedding a codec DAG into each frame and decoding via a universal reader, it eliminates the operational burden of rolling out updated readers and makes format-aware compression practical at scale.

For additional details, see the OpenZL paper and the project repository for tutorials, code, and notebooks. The repo also documents examples and API surfaces for integrating OpenZL into existing workflows.