How South Korea Built a Homegrown LLM Ecosystem: HyperClova, AX, Solar Pro and More
'South Korea is rapidly developing a homegrown LLM ecosystem with government-backed programs, corporate models like HyperClova, AX and Solar Pro, and open-source projects to power Korean-language AI across healthcare, telecom and enterprise.'
South Korea is rapidly building a localized large language model ecosystem centered on Korean language understanding, data privacy and sovereign infrastructure. Government funding, corporate R&D and community open-source efforts are converging to produce models optimized for domestic use cases across healthcare, education, telecom and enterprise services.
Government-backed push for sovereign AI
In 2025 the Ministry of Science and ICT launched a 240 billion won program that selected five consortia to develop sovereign LLMs running on local infrastructure. The consortia are led by Naver Cloud, SK Telecom, Upstage, LG AI Research and NC AI. Alongside funding, regulators have moved to create approval frameworks for specialized text-generating systems: in early 2025 the Ministry of Food and Drug Safety published guidelines for approving medical text-generating AI, one of the first national regulatory frameworks of its kind.
Corporate and academic innovations
Several companies and research institutions have released or announced models tailored to Korean needs:
- SK Telecom: AX 3.1 Lite is a 7 billion-parameter model trained from scratch on 1.65 trillion multilingual tokens with a strong emphasis on Korean. It reports about 96% on KMMLU2 for Korean reasoning and 102% on CLIcK3 for cultural understanding relative to larger models, and is open-source on Hugging Face for mobile and on-device use.
- Naver: HyperClova continues to evolve, with HyperClova X Think (June 2025) improving search-oriented Korean dialogue and retrieval capabilities.
- Upstage: Solar Pro 2 is the only Korean model listed on the Frontier LM Intelligence leaderboard, showing efficiency that matches much larger international models.
- LG AI Research: Exaone 4.0 (July 2025) is a 30 billion-parameter design that competes well on global benchmarks and supports multimodal capabilities.
- Seoul National University Hospital: developed a medical LLM trained on 38 million de-identified clinical records, scoring 86.2% on the Korean Medical Licensing Examination versus a human average of 79.7%.
- Mathpresso + Upstage: MATH GPT is a 13 billion-parameter small LLM that outperforms GPT-4 on specific math benchmarks while using far fewer compute resources.
Open-source and community efforts
Community models and open-source projects address linguistic gaps and promote continual improvement:
- Polyglot-Ko: a family of models (1.3 to 12.8 billion parameters) continually pretrained on Korean datasets to handle code-switching and local nuances.
- Gecko-7B: a 7B community model with continual pretraining focused on Korean text.
Technical trends and design choices
Korean developers prioritize efficiency and domain adaptation. Teams adopt Chinchilla-like token-to-parameter scaling to make 7–30 billion-parameter models compete with much larger Western models while conserving compute. Domain-specific training yields clear gains: the hospital medical LLM and MATH GPT demonstrate that specialized datasets and objectives can beat generalist giants on targeted tasks.
Progress is tracked with benchmarks such as KMMLU2 (Korean reasoning), CLIcK3 (cultural understanding) and the Frontier LM leaderboard. These measurements show parity in many areas with leading global systems.
Market outlook and deployment
Analysts expect the South Korean LLM market to grow from about 182.4 million USD in 2024 to 1,278.3 million USD by 2030, a projected 39.4% CAGR. Growth will be driven by chatbots, virtual assistants, sentiment analysis and sector-specific tools. Telecom operators are integrating edge-computing LLMs to reduce latency and enhance data security, supported by national infrastructure projects like the AI Infrastructure Superhighway.
Why it matters
South Korea's approach reduces dependence on foreign AI vendors, improves privacy by keeping data and models local, and produces culturally aware language models that perform better on Korean tasks. Combining government strategy, corporate investment and open-source momentum, the country is positioning itself as a leader in efficient, domain-specialized LLMs.
Сменить язык
Читать эту статью на русском