A synthetic audio generation framework is introduced to address data scarcity in Air Traffic Control speech recognition. It uses neural techniques like Text-to-Speech and accent conversion to simulate non-native English accents, enhancing Automatic Speech Recognition performance. Experiments with the Whisper model on the ATCO2 corpus show reduced word error rates when fine-tuned with synthetic or mixed real-synthetic data.
Synthetic Audio Framework Improves ATC Speech Recognition
from English