A new evaluation framework for text-to-speech voice reconstruction introduces subjective and objective measures to assess perceived intelligibility and speaker identity. It addresses limitations of existing methods by proposing a dual-reference distributional metric that better captures the trade-off between intelligibility and identity, validated across 193 speakers using 17 zero-shot TTS systems.
Evaluation Framework for TTS Voice Reconstruction
from English