Structure Before Collapse: Transient semantic geometry in next-token prediction
This article investigates how language models learn latent semantic structure despite being trained with one-hot labels that theoretically eliminate shared context statistics. The authors identify a tension between Neural Collapse theory and the observed ability of models to capture categorical features like object properties.