Tencent Unveils Hunyuan-MT-7B and Chimera-7B: Open-Source Breakthrough in Multilingual Translation
New open models from Tencent Hunyuan
Tencent’s Hunyuan team released two open-source translation systems: Hunyuan-MT-7B, a compact 7B-parameter translation model, and Hunyuan-MT-Chimera-7B, an ensemble-style weak-to-strong fusion model. Both are targeted at multilingual machine translation and were introduced alongside Tencent’s WMT2025 submission, where Hunyuan-MT-7B ranked first in 30 of 31 language pairs.
Model designs and capabilities
Hunyuan-MT-7B is a 7 billion parameter model engineered for mutual translation among 33 languages. The supported set includes major languages as well as Chinese minority languages such as Tibetan, Mongolian, Uyghur, and Kazakh. The model is optimized for both high-resource and low-resource scenarios and claims state-of-the-art results for models of comparable size.
Hunyuan-MT-Chimera-7B is an integrated weak-to-strong fusion model. At inference time it combines multiple candidate outputs and uses reward-driven aggregation and reinforcement learning to produce a refined final translation. According to the team, Chimera-7B is the first open-source model of this class and yields quality improvements beyond single-system outputs.
Training methodology
Tencent describes a five-stage framework used to train these models:
General pre-training: 1.3 trillion tokens spanning 112 languages and dialects. Datasets received assessments for knowledge value, authenticity, and writing style, with metadata tagging for domain and topic diversity.
MT-oriented pre-training: Monolingual sources (mC4, OSCAR) were filtered using fastText language ID, minLSH deduplication, and KenLM perplexity filtering. Parallel corpora from OPUS and ParaCrawl were filtered with CometKiwi. The pre-training mix replays 20% of general pre-training data to mitigate catastrophic forgetting.
Supervised fine-tuning (SFT): Conducted in two stages. Stage I used roughly 3 million parallel pairs from Flores-200, WMT test sets, curated Mandarin|minority data, synthetic pairs, and instruction-tuning examples. Stage II selected about 268k high-quality pairs via automated scoring (CometKiwi, GEMBA) and manual verification.
Reinforcement learning (RL): The team applied a GRPO algorithm with composite reward functions. Quality scores used XCOMET-XXL and DeepSeek-V3-0324. Additional rewards included terminology-aware signals and repetition penalties to avoid degenerate outputs.
Weak-to-strong RL: Used in Chimera-7B. Multiple candidate translations are generated and aggregated with reward-based selection to boost robustness and reduce repeating errors.
Benchmark and human evaluations
Automatic benchmarks report strong results across multiple test suites:
WMT24pp (English⇔XX): Hunyuan-MT-7B scored 0.8585 on XCOMET-XXL, ahead of larger closed models such as Gemini-2.5-Pro (0.8250) and Claude-Sonnet-4 (0.8120).
FLORES-200 (33 languages, 1056 pairs): Hunyuan-MT-7B achieved 0.8758 (XCOMET-XXL), outperforming open-source baselines including Qwen3-32B (0.7933).
Mandarin⇔Minority languages: Hunyuan-MT-7B reached 0.6082 (XCOMET-XXL), exceeding Gemini-2.5-Pro (0.5811) and demonstrating notable gains in low-resource pairs.
Comparative highlights include outperformance of Google Translate by 15–65% across different evaluation categories. Despite its smaller parameter count, Hunyuan-MT-7B also beats some specialized translation models such as Tower-Plus-9B and Seed-X-PPO-7B. Chimera-7B contributes an additional approximate 2.3% improvement on FLORES-200, especially in Chinese⇔Other and non-English⇔non-Chinese directions.
Human evaluation used a custom multi-domain set (social, medical, legal, internet). Scores were: Hunyuan-MT-7B average 3.189, Gemini-2.5-Pro 3.223, DeepSeek-V3 3.219, and Google Translate 2.344. These results show that a 7B model can approach the quality of much larger proprietary systems.
Real-world strengths and case studies
The technical report highlights practical translation examples:
Cultural references: Translates the social platform name small red potato term correctly as REDnote rather than a literal ‘sweet potatoes’ translation.
Idioms: Properly captures context for phrases like ‘You are killing me’ as indicating amusement rather than a literal, violent meaning.
Medical terminology: Produces precise translations for terms such as ‘uric acid kidney stones’ where baselines sometimes generate malformed outputs.
Minority languages: Yields coherent outputs for Kazakh and Tibetan where other systems may fail or produce nonsense.
Chimera benefits: Enhances translations in gaming slang, intensifiers, and sports terminology by aggregating multiple candidates and optimizing with RL.
Implications for research and deployment
By open-sourcing Hunyuan-MT-7B and Hunyuan-MT-Chimera-7B, Tencent provides high-performance, accessible tools for multilingual translation research and real-world deployment. The combination of targeted pre-training, careful data curation, and RL-based refinement demonstrates a practical path for improving translation quality in both high- and low-resource languages. Researchers and engineers can inspect the GitHub repo and technical report for details, data processing recipes, and evaluation protocols to reproduce or extend these results.
For full technical details, see the team’s report and repository:
https://github.com/Tencent-Hunyuan/Hunyuan-MT/blob/main/Hunyuan_MT_Technical_Report.pdf https://github.com/Tencent-Hunyuan/Hunyuan-MT