TildeOpen: 30B Open-Source LLM Prioritizing Europe’s Smaller Languages

Release and availability

Latvian language-tech company Tilde published TildeOpen LLM on September 3, 2025. The model is available free to users via Hugging Face and released under a permissive CC-BY-4.0 license. Tilde positions the release as a step toward linguistic equity and EU digital sovereignty, focusing on languages that are often under-represented by mainstream models.

Model architecture and training

TildeOpen is a dense, decoder-only transformer with roughly 30 billion parameters. Its hyperparameters include 60 layers, an embedding size of 6144, 48 attention heads, a context window of 8192 tokens, SwiGLU activations, RoPE positional encodings, and RMSNorm layer normalization.

Training happened on European infrastructure, specifically the LUMI supercomputer in Finland and the JUPITER cluster, using about 2 million GPU hours awarded through the European Commission’s Large AI Grand Challenge. The training run consumed roughly 2 trillion tokens across 450,000 updates and was orchestrated with scripts inspired by EleutherAI’s GPT-NeoX.

The training regime used a three-stage sampling strategy: a uniform pass across languages, a natural-distribution phase to strengthen high-data languages, and a final uniform sweep to rebalance the mixture and improve fairness for low-resource languages.

Tokenizer and language equity

A key technical feature is an “equitable tokenizer” engineered to represent text with similar efficiency across different languages. This reduces token counts and improves inference efficiency for smaller or morphologically rich European languages, addressing a common source of skew where mainstream models favor English and other high-data languages.

By design, TildeOpen aims to reduce grammar errors, awkward phrasing, and hallucinations that typically occur when large models are applied to Baltic, Slavic, or other less-resourced languages.

Deployment, privacy and sovereignty

TildeOpen is open-source and can be self-hosted in local data centers or on EU-compliant cloud infrastructure. This enables organizations to meet GDPR and other regional data-protection standards and reduces dependence on US- or Asia-hosted models. Self-hosting and transparent licensing are central to the model’s goal of supporting national and regional digital sovereignty.

Use cases and roadmap

Tilde describes the released model as a foundational base meant to support specialized downstream models. Expected follow-ups include instruction-tuned variants and translation models tailored to European languages and government or enterprise needs.

Primary applications include translation, government services, education tools, multilingual customer support, speech technologies, and AI assistants that must handle regional languages with accuracy.

Research context and evaluation

TildeOpen joins a broader research effort exploring multilingual model behavior. Public evaluations show that even strong open models can still hallucinate or struggle with precise lexical choices in Baltic languages, so localized development and evaluation remain important. TildeOpen’s balanced training and tokenizer aim to close some of these gaps, but continued benchmarking is necessary.

Quick facts and FAQs

Q1: What is TildeOpen LLM? TildeOpen is a 30-billion-parameter multilingual LLM trained on EU supercomputers and optimized for European languages, with special attention to under-represented languages.

Q2: How does it differ from mainstream LLMs? TildeOpen uses a balanced training strategy and an equitable tokenizer to improve representation and accuracy for smaller languages rather than prioritizing English.

Q3: Can organizations host it themselves? Yes. The model is open-source under CC-BY-4.0 and can be deployed on local infrastructure or EU-compliant clouds to satisfy data-protection and sovereignty requirements.

Q4: What are common use cases? Government services, translation, education, AI assistants, speech tech, and multilingual customer support—anywhere accurate European language processing is required.

For the model page and technical details, Tilde hosts resources on Hugging Face and GitHub, along with tutorials and notebooks for developers and researchers.