This study investigates the statistical learning and mental representation of neural language models by training Generative Transformer models on a synthetic grammar and analyzing their internal representations at various stages.

  • NLMs acquire the most abstract global statistical knowledge at the beginning of learning, followed by relatively local statistical dependencies later.
  • The learning path involves many over-generalizations from the start that are gradually constrained in the later stages.
  • A new framework is proposed to explain the statistical learning and language cognition of NLMs based on these observations.