Liquid AI Releases LFM2.5: A New Compact AI Model Family

Introduction to LFM2.5

Liquid AI has introduced LFM2.5, a new generation of small foundation models built on the LFM2 architecture, targeted at on-device and edge deployments. This model family includes LFM2.5-1.2B-Base and LFM2.5-1.2B-Instruct, extending to Japanese, vision language, and audio language variants. Released as open weights on Hugging Face and exposed through the LEAP platform, LFM2.5 represents a significant advance in compact AI technology.

Architecture and Training Recipe

LFM2.5 maintains the hybrid LFM2 architecture, designed for fast and memory-efficient inference on CPUs and NPUs, while scaling the data and post-training pipeline. The pretraining for the 1.2 billion parameter backbone has been increased from 10T to 28T tokens. The instruct variant undergoes supervised fine-tuning, preference alignment, and large-scale multi-stage reinforcement learning, focusing on instruction-following, tool use, math, and knowledge reasoning.

Performance of Text Models

LFM2.5-1.2B-Instruct is the flagship general-purpose text model. The Liquid AI team reports benchmark results on GPQA, MMLU Pro, IFEval, IFBench, achieving scores of 38.89 on GPQA and 44.35 on MMLU Pro. Competing 1B class open models, such as Llama-3.2-1B Instruct and Gemma-3-1B, score significantly lower on these metrics.

On IFEval and IFBench, which focus on multi-step instruction following and function calling quality, LFM2.5-1.2B-Instruct reports scores of 86.23 and 47.33, respectively, outperforming other 1B class baselines.

Japanese Optimized Variant

LFM2.5-1.2B-JP is specifically optimized for the Japanese language, targeting tasks like JMMLU, M-IFEval, and GSM8K in Japanese. This variant competes effectively against smaller multilingual models such as Qwen3-1.7B and outperforms general instruct models on localized benchmarks.

Vision Language Model

LFM2.5-VL-1.6B is an upgraded vision language model utilizing LFM2.5-1.2B-Base as its language backbone. Incorporating a vision tower, it specializes in tasks such as document understanding and multi-image reasoning, tuned on visual reasoning and OCR benchmarks.

Audio Language Model

The LFM2.5-Audio-1.5B is an audio language model that supports both text and audio inputs/outputs. It features an Audio to Audio model with an audio detokenizer that is eight times faster than its predecessor while maintaining precision on constrained hardware. This model supports interleaved generation for real-time applications and sequential generation for speech recognition and text-to-speech tasks, trained with quantization-aware techniques to ensure performance on devices with limited compute capabilities.

Key Takeaways

LFM2.5 is a 1.2B scale hybrid model family built on the LFM2 architecture, including Base, Instruct, Japanese, Vision Language, and Audio Language variants, all available as open weights.
Pretraining extends from 10T to 28T tokens, with the Instruct model implementing multiple advanced training techniques.
LFM2.5-1.2B-Instruct achieves strong text benchmark performances, leading competing models on notable tasks.
Specialized multimodal and regional variants enhance capabilities for edge applications.