Alibaba Unveils Qwen3-Max-Preview — A Trillion-Parameter LLM with 262K Token Context

September 6, 2025 · 2 min

Overview

Alibaba’s Qwen team introduced Qwen3-Max-Preview (Instruct), their largest language model so far, exceeding one trillion parameters. The model is available via Qwen Chat, the Alibaba Cloud API, OpenRouter, and is set as the default in Hugging Face’s AnyCoder tool.

Scale and Context Limits

Qwen3-Max pushes the scale envelope with over 1 trillion parameters and an ultra-long context window. The model supports up to 262,144 tokens in total (258,048 input and 32,768 output). To improve responsiveness in extended interactions, Alibaba implements context caching that accelerates multi-turn sessions.

Performance and Benchmarks

Early benchmark results position Qwen3-Max ahead of Qwen3-235B-A22B-2507 and competitive with leading models such as Claude Opus 4, Kimi K2, and Deepseek-V3.1. It shows strong results across reasoning, coding, and general-purpose tasks in suites like SuperGPQA, AIME25, LiveCodeBench v6, Arena-Hard v2, and LiveBench. Although not explicitly marketed as a reasoning-focused model, Qwen3-Max demonstrates emergent structured reasoning capabilities on complex tasks.

Pricing, Access, and Distribution

Unlike some earlier Qwen releases, Qwen3-Max is not provided as open weights. Access is limited to APIs and partner platforms, reflecting Alibaba’s commercial orientation. Alibaba Cloud has implemented tiered token-based pricing:

0–32K tokens: $0.861 per million input, $3.441 per million output
32K–128K tokens: $1.434 per million input, $5.735 per million output
128K–252K tokens: $2.151 per million input, $8.602 per million output

These tiers make the model relatively affordable for short tasks but significantly more expensive for very long-context use cases.

Implications for Research and Adoption

The closed-source approach and API-only distribution will help Alibaba commercialize the model quickly, but may slow uptake in academic and open-source communities that rely on weight access for fine-tuning and reproducibility. The model’s scale and long-context capabilities could, however, spur new commercial applications that require processing of very large documents or prolonged conversational state.

Key Points

First Qwen model surpassing one trillion parameters, making it Alibaba’s largest LLM to date.
Ultra-long context support up to 262K tokens with caching for multi-turn speedups.
Competitive benchmark performance versus major commercial models across multiple tasks.
Emergent reasoning capabilities observed despite not being positioned primarily as a reasoning model.
Closed-source distribution and tiered, token-based pricing may limit accessibility for some users.

Where to try it

The preview can be accessed via Qwen Chat and the Alibaba Cloud API. It is also exposed through OpenRouter and used as a default option in Hugging Face’s AnyCoder tool. For additional resources, Alibaba points to its GitHub page for tutorials, codes, and notebooks, and encourages following their social channels and communities.