A new study reveals significant safety and fairness gaps in multilingual speech models, finding that only 8% of state-of-the-art releases document any multilingual analysis. To address this, the authors introduce RedVox, a benchmark built on real voices covering unsafe requests across five languages.

  • RedVox evaluates eight state-of-the-art models using real voices and stereotypical requests in English, French, Italian, Spanish, and German.
  • Vulnerabilities persist under non-adversarial conditions and worsen in non-English languages compared to English.
  • Risks are amplified when the unsafe request is delivered via spoken input rather than text.
  • The study documents privacy challenges associated with collecting speech data from human participants.

This research highlights the urgent need for rigorous multilingual safety evaluations and addresses the sociotechnical difficulties of conducting naturalistic speech safety research.