#speech-to-text + #mistral

Public notes from activescott tagged with both #speech-to-text and #mistral

Wednesday, February 4, 2026

Voxtral Realtime - A 4 billion parameter model aimed at live transcription, achieving “state of the art” transcription with 480ms latency across 13 languages. It can be configurable down to sub-200ms latency.

Performance on the FLEURS benchmark shows that Voxtral Mini Transcribe V2 performs competitively against models from Gemini and OpenAI, with the lowest diarization error rate.