Listenable Interpretable Speaker Embeddings

LISE decomposes speaker embeddings into interpretable components without annotations. Listening experiments show human participants correctly distinguish speakers with 83.9% accuracy, validating the interpretability of the components while preserving ASV performance.