Break the 3-Minute Barrier: Qwen3-ASR-Toolkit Enables Hour-Scale Transcription

What Qwen3-ASR-Toolkit Does

Qwen3-ASR-Toolkit is an MIT-licensed Python command-line tool that makes the Qwen3-ASR-Flash API practical for long audio. The API enforces a single-request limit of 3 minutes or 10 MB, which suits interactive calls but is impractical for hour-long recordings or batch archives. The toolkit automates best practices — VAD-aware segmentation, FFmpeg-based normalization, parallel API dispatch, and post-processing — to produce stable, hour-scale transcription pipelines.

Key capabilities

How it works under the hood

The toolkit implements a deterministic pipeline:

  1. Load a local file or URL.
  2. Run VAD to find silence boundaries and natural chunk points.
  3. Ensure each chunk is under the 3-minute / 10 MB caps.
  4. Resample and normalize audio to 16 kHz mono with FFmpeg.
  5. Submit chunks in parallel to DashScope/Qwen3-ASR endpoints.
  6. Aggregate segments in order.
  7. Post-process text to deduplicate and remove repetitive artifacts.
  8. Emit a .txt transcript matching the input basename.

This approach lets teams batch-process large archives or long live-capture dumps without writing orchestration from scratch.

Quick start

Prerequisites and installation are minimal: Python 3.8+ and FFmpeg on PATH. Install the toolkit with pip:

pip install qwen3-asr-toolkit

Install FFmpeg if needed:

# System: FFmpeg must be available
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install -y ffmpeg

Install the CLI (again shown in the upstream docs):

pip install qwen3-asr-toolkit

Configure your DashScope/Qwen API key:

# International endpoint key
export DASHSCOPE_API_KEY="sk-..."

Run the CLI against local files or URLs:

# Basic: local video, default 4 threads
qwen3-asr -i "/path/to/lecture.mp4"

# Faster: raise parallelism and pass key explicitly (optional if env var set)
qwen3-asr -i "/path/to/podcast.wav" -j 8 -key "sk-..."

# Improve domain accuracy with context
qwen3-asr -i "/path/to/earnings_call.m4a" \
  -c "tickers, CFO name, product names, Q3 revenue guidance"

Output is printed and saved as <input_basename>.txt. Useful arguments include -i/–input-file (file path or http/https URL), -j/–num-threads, -c/–context, -key/–dashscope-api-key, -t/–tmp-dir, and -s/–silence.

Practical tips for production

Where to find code and tutorials

Check the GitHub page for source code, tutorials, and example notebooks. Follow project updates on social channels and community hubs to stay informed about improvements and usage patterns.