SpeechEQ: Benchmarking Emotional Intelligence in Socially Aware Voice Conversational Models
The authors introduce SpeechEQ, a comprehensive framework designed to evaluate the sociolinguistic reasoning of Speech-Language Models. Existing evaluations often overlook the complex cross-modal reasoning required for active dialogue by relying on isolated text or passive acoustic perception. The framework includes a validated dataset of 2,265 dialogues across 15 Emotional Quotient subscales grounded in EQ-i 2.0 theory. It also features a multi-turn evaluation protocol measured by the proposed Spoken EQ score, which is inspired by human EQ assessments. Experiments reveal limitations in how both Speech Emotion Recognition and end-to-end models understand paralinguistic cues through speech. While end-to-end architectures outperform cascaded systems, current multimodal models remain bottlenecked by several specific issues. These barriers include a text-reliant modality shortcut, an alignment-induced safety trap, and contextual amnesia.