PASQA is a speech quality assessment model designed to evaluate pitch-accent correctness in synthetic Japanese speech. It uses a dataset with controlled accent errors and incorporates self-supervised learning, mora-conditioned fusion, ranking loss, and accent-error localization to achieve high accuracy in detecting accent errors across speakers, outperforming conventional models in alignment with human judgments.
PASQA: Pitch-Accent-Focused Speech Quality Model
from English