Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

This study investigates the statistical learning and mental representation of neural language models by training Generative Transformer models on a synthetic grammar and analyzing their internal representations at various stages.

NLMs acquire the most abstract global statistical knowledge at the beginning of learning, followed by relatively local statistical dependencies later.
The learning path involves many over-generalizations from the start that are gradually constrained in the later stages.
A new framework is proposed to explain the statistical learning and language cognition of NLMs based on these observations.