NVIDIA Unveils OpenMath-Nemotron Models Dominating Mathematical Reasoning Benchmarks
NVIDIA has released OpenMath-Nemotron-32B and 14B-Kaggle, cutting-edge AI models excelling in mathematical reasoning and leading major competitions such as the AIMO-2 Kaggle contest, setting new performance benchmarks.
Advancing Mathematical Reasoning in AI
Mathematical reasoning poses a significant challenge for AI, demanding not only comprehension of abstract concepts but also the ability to execute multi-step logical deductions accurately. While traditional language models excel at generating fluent text, they often fall short when addressing complex mathematical problems that require deep domain expertise and structured reasoning. This gap has propelled research into specialized architectures and training approaches designed to enhance AI's mathematical capabilities.
Introducing OpenMath-Nemotron-32B and 14B-Kaggle
NVIDIA has launched OpenMath-Nemotron-32B and OpenMath-Nemotron-14B-Kaggle, two models fine-tuned extensively on the OpenMathReasoning dataset—a collection of challenging problems from mathematical Olympiads and standardized exams. These models build upon the Qwen transformer architecture and are optimized for accuracy, inference speed, and resource efficiency.
OpenMath-Nemotron-32B, the flagship model with 32.8 billion parameters, utilizes BF16 tensor operations for efficient computation. It achieves state-of-the-art results on benchmarks such as the AIME 2024 and 2025, the Harvard–MIT Mathematics Tournament (HMMT), and the HLE-Math series. In its tool-integrated reasoning (TIR) mode, it attains an impressive 78.4% pass@1 score on AIME24 and 93.3% majority-voting accuracy, outperforming previous top models.
Flexible Inference Modes
The 32B model supports three inference modes:
- Chain-of-Thought (CoT): Generates intermediate reasoning steps, achieving 76.5% pass@1 accuracy on AIME24.
- Tool-Integrated Reasoning (TIR): Incorporates external tools for reasoning, boosting accuracy.
- Generative Solution Selection (GenSelect): Produces multiple candidate answers and selects the most consistent, pushing accuracy to 93.3%.
These modes allow users to balance between detailed explanations and precise answers depending on the application.
The 14B-Kaggle Model: Compact and Competitive
OpenMath-Nemotron-14B-Kaggle, with 14.8 billion parameters, is fine-tuned on a subset of the dataset tailored for competitive performance. It led NVIDIA to first place in the AIMO-2 Kaggle competition by focusing on problems reflective of the contest's format and difficulty.
Despite its smaller size, the 14B-Kaggle model delivers strong benchmark results, including a 73.7% pass@1 accuracy on AIME24 in CoT mode and up to 86.7% under GenSelect. Its performance on other benchmarks such as AIME25 and HMMT further underscores its efficiency and adaptability for resource-constrained scenarios.
Open-Source Pipeline and Integration
Both models come with an open-source pipeline integrated into NVIDIA’s NeMo-Skills framework. This includes reference implementations for all inference modes and example code snippets demonstrating transformer pipeline instantiation, data type and device configuration, and output parsing. This setup facilitates rapid prototyping of applications that require step-by-step mathematical reasoning or streamlined final answers.
Hardware Optimization and Deployment
Optimized for NVIDIA GPUs—from Ampere to Hopper architectures—these models leverage BF16 tensor formats, CUDA libraries, TensorRT, and Triton Inference Server for efficient, low-latency deployment. This makes them suitable for production environments requiring high throughput and minimal inference delays.
Applications and Future Directions
Potential applications span AI-driven tutoring, preparation for academic competitions, and integration within scientific computing workflows needing formal or symbolic reasoning. Future enhancements may include support for university-level mathematics, multimodal inputs like handwritten equations, and closer integration with symbolic computation engines to verify and augment solutions.
Сменить язык
Читать эту статью на русском