A flow-matching based text-to-speech model is introduced to simulate the Lombard effect, where humans speak louder and clearer in noisy environments. The model enables continuous, disentangled control of vocal effort and articulation, with word-level emphasis for clarity. Experiments show improved acoustic clarity and intelligibility in noisy conditions compared to baseline systems.
Flow-Matching TTS Model Simulates Lombard Effect
from English