FLUX.2 overview

Black Forest Labs has unveiled FLUX.2, its second-generation image generation and editing system designed for real-world creative workflows. The model targets use cases such as marketing assets, product photography, design layouts, and complex infographics, offering editing support up to 4 megapixels and strong control over layout, logos, and typography.

Product family and deployment options

FLUX.2 is released as a family of variants that cover hosted APIs and open weights:

FLUX.2 [pro]: a managed API tier focused on state-of-the-art quality compared with closed models, high prompt adherence, and low inference cost. Available in the BFL Playground, BFL API, and partner platforms.
FLUX.2 [flex]: exposes parameters like number of steps and guidance scale, allowing developers to balance latency, text rendering fidelity, and visual detail.
FLUX.2 [dev]: the open weight checkpoint derived from the base FLUX.2 model. It combines text-to-image and multi-image editing in one checkpoint and contains 32 billion parameters.
FLUX.2 [klein]: an upcoming Apache 2.0 open source variant distilled to a smaller size for constrained setups while retaining many capabilities.

All variants support image editing from text and multiple reference images within a single model, removing the need to maintain separate checkpoints for generation and editing.

Architecture and the FLUX.2 VAE

FLUX.2 uses a latent flow matching architecture. The design couples a Mistral-3 24B vision-language model with a rectified flow transformer that operates on latent image representations. The vision-language model supplies semantic grounding and world knowledge, while the transformer backbone learns spatial structure, materials, and composition.

Training maps noise latents to image latents under text conditioning, enabling the same architecture to support both text-driven synthesis and editing. For editing, latents are initialized from existing images and then updated through the same flow process while preserving structure.

A new FLUX.2 VAE defines the latent space and balances learnability, reconstruction quality, and compression. The VAE is released separately on Hugging Face under an Apache 2.0 license and serves as the backbone for all FLUX.2 flow models. It can also be reused in other generative systems.

Capabilities for production workflows

Documentation and Diffusers integration highlight several production-oriented capabilities:

Multi-reference support: combine up to 10 reference images to maintain character identity, product appearance, and consistent style across outputs.
Photoreal detail at 4MP: generate and edit images up to 4 megapixels with improved textures, skin rendering, fabrics, hands, and lighting suited for product shots and photoreal use cases.
Robust text and layout rendering: handle complex typography, infographics, memes, and user interface layouts with small legible text, addressing a common weakness in older models.
World knowledge and spatial logic: more grounded lighting, perspective, and scene composition to reduce artifacts and the synthetic look.

Performance, quantization, and integrations

Full-precision inference requires more than 80 GB of VRAM. However, FLUX.2 [dev] supports quantized pipelines (4-bit and FP8) and offloading strategies that make the model usable on 18 GB to 24 GB GPUs, and even on 8 GB cards with sufficient system RAM. These quantized and offload-friendly profiles lower the barrier for practical deployment.

The release includes clear integration points with Diffusers, ComfyUI, Cloudflare Workers, and other tooling, plus hosted access via BFL Playground and API endpoints.

Licensing and safety

The open-weight FLUX.2 [dev] checkpoint is available alongside an Apache 2.0 FLUX.2 VAE. The core model weights use the FLUX.2-dev Non Commercial License and incorporate mandatory safety filtering.

Why FLUX.2 matters

By combining a 32B rectified flow transformer, a Mistral-3 24B vision-language model, and an Apache 2.0 VAE into a single high-fidelity pipeline, FLUX.2 brings open-weight image models closer to production-grade creative infrastructure. Its VRAM profiles, quantized variants, and integrations make it practical for real workloads beyond benchmark demos.

FLUX.2: 32B Flow Transformer for Production-Grade 4MP Image Generation and Editing