This article proposes using thermodynamic phase-transition theory to understand the dynamics of language model alignment during post-training, specifically through the lens of material crystallization. The authors argue that this physical framework provides a principled vocabulary for reasoning about how models change and where alignment-induced structure originates.

  • The study identifies three distinct phases in random number generation tasks: a high entropy liquid phase in pretrained models, a nucleation phase during supervised finetuning where behavior collapses to a single seed distribution, and a settling phase where reinforcement learning redistributes probability while maintaining concentration on the seed options.
  • Intuitive metrics are proposed to verify transitions between these phases, with validation performed across a range of random tasks.
  • The authors suggest that importing physical frameworks like crystallization can help answer fundamental questions about why alignment converges where it does and what it cannot change.

This approach aims to provide researchers with better tools to understand the origins and limitations of alignment-induced structure in language models.