Yandex Unveils ARGUS: A Billion-Parameter Transformer for Next-Level Recommendations

September 6, 2025 · 2 min

Yandex has introduced ARGUS (AutoRegressive Generative User Sequential modeling), a transformer-based recommender framework that scales up to one billion parameters. The release marks a major step forward in modeling long-term user behavior and places Yandex among a small group of tech leaders demonstrating large-scale recommender transformers in production.

Scaling recommender transformers to long horizons

Recommender systems traditionally struggle with short-term memory, limited scalability, and weak adaptability to changing user behavior. Many architectures truncate user histories to a short window of recent interactions, discarding months or years of data. That approach misses long-term habits, subtle shifts in taste, and seasonal cycles, and becomes impractical as item catalogs reach billions. ARGUS addresses these issues by modeling entire behavioral timelines, enabling a long-horizon view that captures evolving intent and recurring patterns without relying solely on recent signals.

Key technical innovations

Dual-objective pre-training: ARGUS decomposes autoregressive learning into two subtasks — next-item prediction and feedback prediction — improving both imitation of historical system actions and the modeling of genuine user preferences.
Scalable transformer encoders: Models in the ARGUS family scale from 3.2M up to 1B parameters. Performance improves consistently across the scale, and at the billion-parameter level ARGUS showed a 2.66% uplift in pairwise accuracy, indicating a clear scaling benefit for recommender transformers.
Extended context modeling: ARGUS processes user histories up to 8,192 interactions in a single pass, enabling personalization that reflects months of behavior rather than only the latest clicks.
Efficient fine-tuning and deployment: A two-tower architecture separates offline embedding computation from online serving, reducing inference cost compared with target-aware or impression-level online models while remaining scalable.

Real-world deployment and measured gains

ARGUS is already deployed at scale on Yandex’s music platform, serving millions of users. Production A/B tests reported significant quality improvements: a 2.26% increase in total listening time (TLT) and a 6.37% increase in like likelihood. These gains are the largest recorded for any deep learning–based recommender model on the platform.

Future directions and implications

Yandex plans to extend ARGUS to real-time recommendation scenarios, explore feature engineering for pairwise ranking, and adapt the framework to high-cardinality domains like large e-commerce and video platforms. The results suggest that recommender systems can follow a scaling trajectory similar to natural language models, unlocking deeper personalization by modeling long user sequences.

ARGUS is documented in a research paper published by the Yandex team, and the framework represents a notable contribution to large-scale recommendation research and production practice.