This study compared the recognition performance of human listeners against three state-of-the-art off-the-shelf ASR systems (Whisper-large-V3, Google Chirp 3, and Omnilingual) on Dutch continuous read and spontaneous speech from a single speaker with severe dysarthria.

  • Both human listeners and the three ASR systems exhibited average word error rates (WER) exceeding 70% on the unmodified data.
  • Fine-tuning the models on dysarthric speech significantly reduced WER, although overall rates remained above 23%.
  • The personalized DSR models outperformed human listeners, with performance approaching levels useful for supporting day-to-day communication.

The findings indicate that while dysarthric speech recognition is highly challenging, personalized models offer a viable path toward supporting daily communication for speakers with severe dysarthria.