Unbabel Launches TOWER+: The Breakthrough Multilingual LLM for Accurate Translation and Instruction Following

Advancing Machine Translation with Large Language Models

Large language models have significantly advanced machine translation by utilizing extensive training data to translate numerous languages and dialects, capturing subtle linguistic nuances. However, fine-tuning these models to improve translation accuracy often compromises their instruction-following and conversational abilities. General-purpose models struggle to meet professional standards for translation fidelity, especially when balancing cultural nuances, code generation, problem-solving, and user-specific formatting. Maintaining terminological consistency and formatting across diverse audiences presents further challenges. Enterprises demand adaptable systems that cater to domain-specific needs and user preferences without losing fluency.

Challenges in Tailoring Language Models for Translation

Various strategies have been employed to enhance language models for translation accuracy. Fine-tuning on parallel corpora improves translation adequacy and fluency, while continued pretraining on monolingual and parallel data boosts multilingual fluency. Reinforcement learning with human feedback helps align outputs with quality preferences. Proprietary models like GPT-4o and Claude 3.7 lead in translation quality, while open-weight models such as TOWER V2 and GEMMA 2 have matched or outperformed closed-source models in certain languages. These efforts highlight the ongoing challenge of balancing precise translation and broad language capabilities.

Introducing TOWER+: Unified Training for Translation and General Tasks

Unbabel, in collaboration with Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa, and MICS, CentraleSupélec, Université Paris-Saclay, introduced TOWER+, a family of models at scales of 2B, 9B, and 72B parameters. The goal was to find an optimal balance between translation specialization and general-purpose utility. Using a unified training pipeline, TOWER+ models aim to achieve high translation performance alongside robust instruction-following and conversational skills, supporting diverse applications.

TOWER+ Training Pipeline

The training process includes:

Benchmark Performance Highlights

The TOWER+ 9B model achieved a 33.47% win rate on multilingual chat prompts and an 84.38 XCOMET-XXL score across 24 language pairs, surpassing similar open-weight models. The flagship 72B parameter model scored 54.52% on M-ArenaHard, 89.02 on IFEval instruction-following, and 83.29 XCOMET-XXL on WMT24++, establishing new open-weight benchmarks. The combined translation and instruction-following benchmark IF-MT showed 5.55 for instruction adherence and 88.95 for translation fidelity, confirming TOWER+ as a state-of-the-art solution for both enterprise and research.

Technical Summary

TOWER+ offers a scalable, Pareto-optimal framework for future translation-focused large language models.

For more details, see the original research paper and models provided by the team.