NVIDIA Releases Parakeet TDT 0.6B: Ultra-Fast and Highly Accurate Open-Source Speech Recognition Model

Breakthrough in Speech Recognition Performance

NVIDIA has launched Parakeet TDT 0.6B, an advanced automatic speech recognition (ASR) model featuring 600 million parameters. This model is now fully open-sourced on Hugging Face under a commercially permissive CC-BY-4.0 license. Parakeet TDT 0.6B boasts a remarkable real-time factor (RTF) of 3386, meaning it can transcribe one hour of audio in just one second, which is more than 50 times faster than many other open ASR models.

Speed and Accuracy Combined

The model achieves a 6.05% word error rate (WER) on the Hugging Face Open ASR Leaderboard, leading all open-source speech recognition models in transcription accuracy. This makes it highly suitable for enterprise applications such as real-time transcription, voice analytics, call center intelligence, and audio content indexing.

Key Technical Features

Parakeet TDT 0.6B is based on a transformer encoder-decoder architecture, fine-tuned with high-quality transcription datasets and optimized for NVIDIA hardware. Important technical highlights include:

600 million parameter encoder-decoder model
Quantized and fused kernels for highly efficient inference
Optimized Transducer Decoder Transformer (TDT) architecture
Support for accurate timestamp formatting, numerical formatting, and punctuation restoration
Unique capability for song-to-lyrics transcription, rare among ASR models

The model leverages NVIDIA’s TensorRT and FP8 quantization technologies to achieve its ultra-fast inference speed.

Benchmark Leadership and Deployment Readiness

On May 5, 2025, Parakeet TDT 0.6B topped the Hugging Face Open ASR Leaderboard with the lowest WER recorded among open models, outperforming competitors like OpenAI's Whisper. This highlights its readiness for latency-sensitive and real-time applications.

Advanced Transcription Capabilities

Beyond speed and accuracy, Parakeet offers specialized features that enhance transcript quality and usability:

Song-to-lyrics transcription: Enables transcription of sung audio, opening new possibilities in music indexing and media.
Numerical and timestamp formatting: Improves clarity in structured documents like meeting notes and legal transcripts.
Punctuation restoration: Delivers more natural and readable text for downstream natural language processing.

These features reduce the need for extensive post-processing or manual editing, especially valuable for enterprise use cases.

Strategic Importance for NVIDIA and AI Developers

This open-source release strengthens NVIDIA's position as a leader in AI infrastructure, complementing its portfolio of foundational models like Nemotron for language and BioNeMo for protein design. For developers, Parakeet TDT 0.6B offers a robust foundation for building advanced speech interfaces across a broad range of applications, from smart devices to multimodal AI agents.

How to Access and Use Parakeet TDT 0.6B

The model is available now on Hugging Face, including all necessary assets such as model weights, tokenizer, and inference scripts. It performs best on NVIDIA GPUs with TensorRT but also supports CPU environments with some throughput trade-offs. This makes it an attractive open-source alternative to commercial speech APIs for transcription services, audio dataset annotation, and voice integration in products.

Explore the model on Hugging Face and follow NVIDIA's updates on Twitter to stay informed about the latest developments.