FILTER MODE ACTIVE

#ASR

Records found: 14

#ASR22/01/2026

Microsoft Unveils VibeVoice-ASR for Long-Form Audio

VibeVoice-ASR offers a unified speech-to-text model for 60-minute audio handling.

READ →

#ASR20/01/2026

Design a Fully Streaming Voice Agent with Low Latency

Build a low-latency voice agent with ASR, LLM, and TTS streaming.

READ →

#ASR07/01/2026

NVIDIA Unveils Nemotron ASR for Low-Latency Applications

Explore NVIDIA's new Nemotron Speech ASR model designed for voice agents and live captioning with low-latency performance.

READ →

#ASR05/10/2025

Evaluating Voice Agents in 2025: Beyond WER to Task Success, Barge-In, and Noise-Driven Hallucinations

'A practical framework for evaluating modern voice agents that extends beyond ASR and WER to include task success, barge-in handling, hallucination-under-noise, safety, and perceptual quality.'

READ →

#ASR01/10/2025

Liquid AI Launches LFM2-Audio-1.5B: End-to-End Audio Model with Sub-100 ms Latency

'Liquid AI released LFM2-Audio-1.5B, an end-to-end audio-language model that achieves sub-100 ms latency and supports ASR, TTS and conversational agents from a compact 1.5B-parameter stack.'

READ →

#ASR17/09/2025

Build a Real-Time Voice AI Agent with Hugging Face Pipelines (Whisper + FLAN-T5 + Bark)

'Learn how to build a real-time voice AI agent with Whisper for ASR, FLAN-T5 for reasoning, and Bark for TTS — all running in Colab with a simple Gradio UI.'

READ →

#ASR10/09/2025

Boost ASR Accuracy with SpeechBrain: Build a Denoise + Recognition Pipeline in Python

A hands-on guide to building a compact pipeline with SpeechBrain that generates speech, adds noise, enhances audio with MetricGAN+, and measures ASR word error rates before and after denoising

READ →

#ASR09/09/2025

Qwen3-ASR Flash: Alibaba's Single-Model Leap in Multilingual, Noise-Robust Speech Recognition

'Qwen3-ASR Flash from Alibaba is a single-model ASR that auto-detects and transcribes 11 languages, supports context injection for domain terms, and keeps WER below 8% in noisy or musical audio.'

READ →

#ASR01/09/2025

StepFun AI Unveils Step-Audio 2 Mini — Open-Source 8B Speech-to-Speech Model That Tops GPT-4o-Audio

'StepFun AI released Step-Audio 2 Mini, an open-source 8B speech-to-speech model that combines unified audio-text tokenization, emotion-aware generation, and retrieval-augmented grounding to beat GPT-4o-Audio on multiple benchmarks.'

READ →

#ASR23/08/2025

How Voice Agents Work and the Top 9 Platforms to Try in 2025

'Discover how AI voice agents work, why they matter now, and compare the top 9 platforms to build production-grade voice bots in 2025.'

READ →

#ASR16/08/2025

NVIDIA Unveils Granary: Europe’s Largest Open-Source Speech Dataset and Ultra-Fast ASR Models

'NVIDIA launched Granary, a one-million-hour open-source speech dataset covering 25 European languages, alongside Canary-1b-v2 and Parakeet-tdt-0.6b-v3 models for fast, accurate ASR and speech translation.'

READ →

#ASR17/07/2025

NVIDIA Launches Canary-Qwen-2.5B: The Leading ASR-LLM Hybrid Model with Unmatched Accuracy and Speed

NVIDIA's Canary-Qwen-2.5B model sets a new benchmark in speech recognition with a record low Word Error Rate and fast processing speed. This open-source, commercially licensed hybrid ASR-LLM model enables advanced audio transcription and language understanding.

READ →

#ASR17/07/2025

Mistral AI Unveils Voxtral: Leading Open-Source Speech Recognition Models with Advanced Audio Understanding

Mistral AI launches Voxtral, cutting-edge open-weight speech recognition models that integrate transcription and language understanding with support for long audio contexts and multiple languages.

READ →

#ASR06/05/2025

NVIDIA Releases Parakeet TDT 0.6B: Ultra-Fast and Highly Accurate Open-Source Speech Recognition Model

NVIDIA has released Parakeet TDT 0.6B, an open-source ASR model that transcribes an hour of audio in just one second while achieving top accuracy benchmarks, setting a new industry standard.

READ →