<RETURN_TO_BASE

Tencent Releases Hunyuan-A13B: Efficient 13B Parameter MoE Model with Dual-Mode Reasoning and Massive 256K Context Support

Tencent introduces Hunyuan-A13B, a highly efficient open-source MoE language model with dual-mode reasoning and support for ultra-long 256K context lengths, achieving state-of-the-art benchmark results.

Introduction to Hunyuan-A13B

Tencent's Hunyuan team has unveiled Hunyuan-A13B, a cutting-edge open-source large language model built on a sparse Mixture-of-Experts (MoE) architecture. Despite having 80 billion total parameters, only 13 billion are actively used during inference. This design strikes a balance between high performance and computational efficiency.

Architecture and Features

Hunyuan-A13B features a fine-grained MoE architecture with one shared expert and 64 non-shared experts, activating 8 experts per forward pass. It contains 32 layers and employs SwiGLU activations and a large vocabulary of 128K tokens. Grouped Query Attention (GQA) enables efficient memory use, especially for long-context inference with support for an impressive 256K token context length.

The training process includes a 20 trillion token pretraining phase, followed by fast annealing and long-context adaptation phases. The context window is scaled progressively from 32K to 256K tokens using NTK-aware positional encoding, maintaining stable performance across large sequences.

Dual-Mode Reasoning Capability

A notable innovation is the dual-mode Chain-of-Thought reasoning framework. Users can switch between a fast-thinking mode (/no think) for quick responses and a slow-thinking mode (/think) for complex, multi-step reasoning. This flexibility allows tailoring compute resources based on task complexity.

Post-Training Enhancements

The model undergoes multi-stage supervised fine-tuning and reinforcement learning with task-specific reward models. Reinforcement learning incorporates outcome-based rewards and tool-specific feedback, including sandboxed code execution and rule-based agent validation.

In agent training, various tool-use scenarios were synthesized involving planner, checker, and tool roles, generating over 20,000 format combinations. This robust training enables Hunyuan-A13B to handle real-world workflows such as spreadsheet manipulation, information retrieval, and structured reasoning.

Benchmark Performance

Hunyuan-A13B delivers state-of-the-art results across multiple benchmarks:

  • Matches or exceeds larger dense and MoE models on MATH, CMATH, and GPQA.
  • Outperforms Qwen3-A22B and DeepSeek R1 in logical reasoning benchmarks (BBH: 89.1; ZebraLogic: 84.7).
  • Strong coding performance with 83.9 on MBPP and 69.3 on MultiPL-E.
  • Leads in agentic benchmarks BFCL-v3 (78.3) and ComplexFuncBench (61.2).
  • Excels in long-context understanding, scoring 87.7 on PenguinScrolls and maintaining 73.9 on RULER at 64K–128K context lengths, outperforming larger models.

Inference and Deployment

Hunyuan-A13B integrates seamlessly with popular inference frameworks such as vLLM, SGLang, and TensorRT-LLM. It supports various precision formats including W16A16, W8A8, and KV Cache FP8, along with features like Auto Prefix Caching and Chunk Prefill. It achieves up to 1981.99 tokens per second throughput on a 32-batch input, making it suitable for real-time applications.

Open Source Availability and Industry Impact

The model is available on Hugging Face and GitHub under a permissive open-source license, designed for both research and production, especially in latency-sensitive and long-context scenarios. By combining scalable MoE architecture with advanced agentic reasoning, Hunyuan-A13B provides a compelling alternative to larger LLMs, enabling wider experimentation and deployment without compromising capabilities.

For more details, check the project's paper and repositories. Follow the team on Twitter and join their ML community for updates.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский