Researchers introduce DyadEE, a dataset for detecting emotional entrainment in dyadic speech, and propose TRACE, a window-level framework that models these interactions as ordered sequences of acoustic embeddings. The study demonstrates that incorporating conversational context and relationship information significantly improves detection accuracy.
- DyadEE dataset contains both emotionally entrained conversations and synthetic interactions with disrupted entrainment via partner swapping and emotion resynthesis.
- TRACE treats each sample as an interaction trace using emotion fine-tuned Whisper representations rather than pooled utterances.
- The model achieves a best accuracy of 97.01% on the DyadEE dataset by leveraging temporal relationship-aware modeling.