A study demonstrates that adapting foundation automatic speech recognition (ASR) models to individual speakers can significantly improve performance on dysarthric speech, which is often poorly handled by standard systems. The researchers built a personalized system using the TEQST tool to collect 92 hours of read speech and 8.8 hours of user corrections from a mobile app.

  • Fine-tuning Whisper with only 1.4 hours of adaptation data reduced the word error rate to 15.8%.
  • Performance improved to 10.7% when using 22.5 hours of data.
  • The best result of 9.7% was achieved by incorporating all available data, including user corrections.
  • Using LoRA adaptation or Qwen3-ASR as the foundation model yielded worse results in this specific setting.

The findings indicate that personalized fine-tuning makes foundation ASR models substantially more effective for dysarthric speech and suitable for practical deployment.