NVIDIA's Joey Conway Reveals Breakthroughs in Open-Source AI with Llama Nemotron Ultra and Parakeet

NVIDIA's Open-Source AI Innovations

Joey Conway from NVIDIA shared exciting details about their latest open-source large language models, including Llama Nemotron Ultra and Parakeet TDT. These models push the boundaries of AI performance while remaining accessible for deployment on common hardware.

Llama Nemotron Ultra: Compact Yet Powerful

Llama Nemotron Ultra is a 253 billion parameter model delivering performance comparable to models twice its size, such as Llama 405B. It can run efficiently on a single 8x NVIDIA H100 node thanks to innovative techniques like FFN fusion, which optimizes feed-forward network layers for speed and memory efficiency. This fusion achieves 3 to 5 times speedups and reduces memory footprint, enabling longer context lengths.

A unique feature is the "reasoning on/off" capability, allowing users to toggle detailed reasoning per query. This offers enterprises control over latency and cost while maintaining accuracy when needed. The model excels in both reasoning and instruction-following tasks, simplifying deployment by combining these previously separate capabilities.

Data Curation and Quality Assurance

NVIDIA emphasizes openness by releasing curated datasets with around 30 million question-answer pairs on Hugging Face. They generate synthetic training data using expert community models, followed by multi-layer quality checks including automated scoring, human review, and diversity assessments. This rigorous pipeline ensures high-quality data for supervised fine-tuning and reinforcement learning phases, improving reasoning, tool calling, chat, and more.

Reinforcement Learning and Continuous Improvement

Post supervised fine-tuning, NVIDIA has begun applying reinforcement learning to further enhance model accuracy. Automated feedback loops grade model outputs across domains such as scientific reasoning and instruction following, enabling continuous learning and improvements beyond traditional fine-tuning.

Parakeet TDT: Revolutionizing Speech Recognition

Parakeet TDT is an automatic speech recognition model capable of transcribing one hour of audio in one second with only a 6% word error rate, 50 times faster than other open-source models. It uses a Fast Conformer architecture enhanced by depth-wise separable convolution downsampling, limited context attention, sliding window attention, and a novel Token and Duration Transducer (TDT).

The TDT allows the model to predict token durations, skipping redundant frames for speedups of 1.5 to 2 times. Additional optimizations like label looping and CUDA graph-based decoding provide further acceleration. NVIDIA plans to expand Parakeet models in size and multilingual support, and to enable real-time streaming capabilities.

Commitment to Open-Source and Future Directions

All models, datasets, and related software are openly available on Hugging Face, NGC, and GitHub. NVIDIA aims to empower the community with state-of-the-art, efficient, and production-ready AI models. Future work includes multilingual support, smaller edge-optimized models, and advances in real-time speech transcription, maintaining a balance between accuracy, speed, and cost-efficiency.