JetBrains Launches Mellum: Open-Source Language Model Tailored for Developers
JetBrains has open-sourced Mellum, a 4-billion-parameter language model specialized for programming tasks, aiming to improve AI-assisted software development.
Mellum: A Language Model Built for Coding
JetBrains has unveiled Mellum, a 4-billion-parameter language model designed specifically for software development tasks. This model reflects JetBrains’ commitment to an engineering-first approach by focusing on code-related applications such as autocompletion, infilling, and structural code understanding.
Narrow Yet Deep Specialization
Mellum is described by JetBrains as a “focal model,” meaning it is specialized narrowly but deeply for programming workloads. Unlike broader general-purpose large language models (LLMs), Mellum avoids unnecessary linguistic overhead, enhancing efficiency in integrated development environment (IDE)-style contexts.
Wide Language Support
The model supports numerous programming languages including Java, Kotlin, Python, Go, PHP, C, C++, C#, JavaScript, TypeScript, CSS, HTML, Rust, and Ruby, accommodating the diverse needs of modern polyglot development teams.
Architecture and Training Details
Following a LLaMA-like architecture, Mellum was trained from scratch on over 4.2 trillion tokens sourced from code-rich datasets like The Stack, StarCoder, CommitPack, and English Wikipedia. It features an 8,000-token context window and was trained using bf16 mixed precision on a cluster of 256 NVIDIA H200 GPUs connected via Infiniband. The training spanned approximately 20 days.
Benchmark Performance
JetBrains tested Mellum on several benchmarks reflecting key use cases:
- RepoBench v1.1 (8K context): Python EM 27.97%, Java EM 31.08%
- SAFIM (Syntax-Aware Fill-in-the-Middle): pass@1 38.11%
- HumanEval Infilling: Single-line 66.21%, Multi-line 38.52%, Random-span 29.70%
These results demonstrate Mellum’s strength in structured code understanding, especially for interrupted or partial code segments common in development workflows.
Open Sourcing Motivations
JetBrains open-sourced Mellum to foster transparency, allow reuse in custom environments, encourage community collaboration, and provide educational value. Both the base model (Mellum-4b-base) and a Python fine-tuned version (Mellum-4b-sft-python) are available under the Apache 2.0 license on Hugging Face.
Impact on Developer Tools
With Mellum, JetBrains aims to enhance AI-driven developer tooling by offering a compact, efficient model optimized for source code. This fits their broader vision of deploying multiple focal models for specialized programming tasks like diff generation and code review assistance, supporting cost-effective and context-aware AI integration.
Mellum represents a significant advancement toward specialized, practical language models designed specifically for software engineering, providing a robust foundation for future AI-assisted development tools.
Сменить язык
Читать эту статью на русском