This article investigates how language models learn latent semantic structure despite being trained with one-hot labels that theoretically eliminate shared context statistics. The authors identify a tension between Neural Collapse theory and the observed ability of models to capture categorical features like object properties.
- Balanced one-hot classification pushes representations to be equally distant, ignoring input similarity.
- Language models still represent latent classes (e.g., medium-sized, rigid nouns) despite one-hot training regimes.
- Three synthetic controlled settings were used where inputs have latent semantic factors mapped to distinct labels.
- Semantic geometry emerges early in training, with representations clustering by shared attributes without explicit supervision.
- This structure is transient; sufficient capacity and time lead to a symmetric state where all representations are equally separated.
The study proposes a preliminary modification to the unconstrained features model to capture this emergent semantic geometry observed during the phase transition.