This paper reinterprets Large Language Models as high-dimensional Dense Associative Memories where correct reasoning corresponds to deep attractor basins in the energy landscape. The authors introduce a retrieval mechanism that samples multiple reasoning paths and weights them by inverse energy to approximate the equilibrium distribution.
- Correct reasoning chains are modeled as deep, wide attractor basins, while hallucinations are treated as sharp, unstable local minima.
- A Gibbs measure of spectral entropy is used to weight reasoning trajectories by their inverse energy ($P \propto e^{-\beta E}$).
- This physics-inspired mechanism improves Microsoft Phi-3.5 performance on GSM8K from 84.7% to 90.1%, a gain of 5.38%.
The study demonstrates that inference is more accurately modeled as a dynamic settling process into an attractor basin rather than greedy next-token prediction.