This study audits the reliability of eight state-of-the-art Automatic Speech Recognition models on real-world psychiatric interview data in Kannada, Hindi, and Indian English. The results reveal substantial variability across models and languages, with some systems performing competitively in Indian English but failing in regional speech.

  • The audit compared IndicWhisper, WhisperLargeV3, Sarvam, GoogleS2T, Gemma3n, OmniLingual, Vaani, and Gemini.
  • Fine-tuning the best-performing open-source models, Gemma3n and OmniLingual, uncovered systematic performance gaps tied to speaker role and gender.
  • The authors propose SamaVaani, a unified debiasing technique that simultaneously improves ASR performance and fairness across demographic groups.

The findings raise concerns about equitable deployment in clinical settings, which are addressed by the proposed fairness-aware fine-tuning methods.