Where Do Models Find Happiness? Emotion Vectors in Open-Source LLMs

This study investigates the presence and structure of emotion vectors in open-weight large language models, specifically Apertus-8B-Instruct-2509 and Gemma-4-E4B-it. The research confirms that these models encode valence geometry with high correlation to human psychological structures, approaching the levels previously observed in Claude Sonnet 4.5.

Valence correlations reached r = 0.76 for Apertus-8B-Instruct-2509 and r = 0.83 for Gemma-4-E4B-it, compared to r = 0.81 for Claude Sonnet 4.5.
Valence representation emergence differs by depth: it is strong in early layers but collapses in later layers for Gemma-4-E4B-it, while appearing only at mid-depths for Apertus-8B-Instruct-2509.
Arousal encoding sensitivity varies by extraction corpus, with both models showing stronger alignment using Gemma-generated stories (r up to 0.45) than Apertus-generated ones (r ≤ 0.21).

The authors open-source their experiment code and dataset to facilitate reproducible investigation of emotion representations across different language model architectures.

These findings demonstrate that internal emotional representations are a generalizable feature across different open-source LLM architectures, though their emergence and stability vary significantly based on model depth and training data distribution.