#Xiaomi20/09/2025
MiMo-Audio: Xiaomi's 7B Speech LM Trained on 100M+ Hours with High-Fidelity RVQ Tokens
'MiMo-Audio is a 7B audio-language model from Xiaomi that pairs a high-fidelity RVQ tokenizer with patchified next-token training over 100M+ hours of audio, unlocking few-shot speech tasks and strong benchmark results.'