Black Forest Labs Launches FLUX.2 [klein] Models

Compact Models for Consumer Hardware

Black Forest Labs releases FLUX.2 [klein], a compact image model family that targets interactive visual intelligence on consumer hardware. FLUX.2 [klein] extends the FLUX.2 line with sub-second generation and editing, a unified architecture for text to image and image to image, and deployment options that range from local GPUs to cloud APIs, while maintaining state-of-the-art image quality.

From FLUX.2 [dev] to Interactive Visual Intelligence

FLUX.2 [dev] is a 32 billion parameter rectified flow transformer designed for text-conditioned image generation and editing, capable of composition with multiple reference images, primarily running on data center-class accelerators. It is tuned for maximum quality and flexibility, featuring long sampling schedules and high VRAM requirements.

FLUX.2 [klein] takes the same design philosophy and compresses it into smaller rectified flow transformers with 4 billion and 9 billion parameters. These models are distilled to very short sampling schedules, support the same text-to-image and multi-reference editing tasks, and are optimized for response times below 1 second on modern GPUs.

Model Family and Capabilities

The FLUX.2 [klein] family consists of four main open weight variants through a single architecture:

FLUX.2 [klein] 4B
FLUX.2 [klein] 9B
FLUX.2 [klein] 4B Base
FLUX.2 [klein] 9B Base

FLUX.2 [klein] 4B and 9B are step distilled models that utilize four inference steps, providing the fastest options for production and interactive workloads. The FLUX.2 [klein] 9B combines a 9B flow model with an 8B Qwen3 text embedder, described as the flagship small model on the Pareto frontier for quality versus latency across text to image, single reference editing, and multi-reference generation.

The Base variants are undistilled versions featuring longer sampling schedules, preserving the complete training signal and providing higher output diversity. They are intended for fine-tuning, LoRA training, research pipelines, and custom post-training workflows where control is prioritized over minimum latency.

All FLUX.2 [klein] models are built on the same architecture and support three core tasks: generating images from text, editing a single input image, and performing multi-reference generation and editing, where several input images and a prompt define the target output.

Latency, VRAM, and Quantized Variants

The FLUX.2 [klein] model card provides approximate end-to-end inference times on GB200 and RTX 5090. FLUX.2 [klein] 4B is the fastest variant, with inference times ranging from 0.3 to 1.2 seconds per image, while FLUX.2 [klein] 9B targets about 0.5 to 2 seconds at a higher quality. The Base models have longer sampling schedules, requiring several seconds but offering flexibility for custom pipelines.

The FLUX.2 [klein] 4B fits in about 13 GB of VRAM, making it suitable for GPUs like the RTX 3090 and RTX 4070. The FLUX.2 [klein] 9B requires about 29 GB of VRAM, targeting hardware such as the RTX 4090, making it possible for a single high-end consumer card to host the distilled variants with full-resolution sampling.

To extend compatibility to more devices, Black Forest Labs also releases FP8 and NVFP4 versions for all FLUX.2 [klein] variants, developed in collaboration with NVIDIA. FP8 quantization is reportedly up to 1.6 times faster with 40% lower VRAM usage, while NVFP4 offers up to 2.7 times speedup with 55% lower VRAM usage on RTX GPUs, preserving core capabilities.

Benchmarks Against Other Image Models

Black Forest Labs evaluates FLUX.2 [klein] through Elo-style comparisons on text to image, single reference editing, and multi-reference tasks. Performance charts position FLUX.2 [klein] on the Pareto frontier of Elo score versus latency and VRAM. The commentary indicates that FLUX.2 [klein] matches or exceeds the quality of Qwen-based image models at a fraction of the latency and VRAM, outperforming Z Image while supporting unified text-to-image and multi-reference editing in one architecture.

Key Takeaways

FLUX.2 [klein] is a compact rectified flow transformer family with 4B and 9B variants that supports text-to-image, single image editing, and multi-reference generation in one unified architecture.
The distilled FLUX.2 [klein] 4B and 9B models utilize four sampling steps optimized for sub-second inference on modern GPUs, while the undistilled Base models employ longer schedules intended for fine-tuning and research.
Quantized FP8 and NVFP4 variants, developed with NVIDIA, provide up to 1.6 times speedup and approximately 40% VRAM reduction for FP8, and up to 2.7 times speedup with about 55% VRAM reduction for NVFP4 on RTX GPUs.