This article develops a quantitative theory for the Random Language Model (RLM) in a scaling limit where the number of hidden symbols approaches infinity while the grammar temperature approaches zero at a fixed ratio. The study establishes that the model admits a controlled description based on a large-deviation principle over rule-usage patterns, mapping the problem to Random Energy Models with nontrivial combinatorics.
- The RLM exhibits a condensation transition at a critical value of x=1/8, below which rule usage concentrates and language statistics depend on corpus length.
- A second characteristic scale at x=1/2 marks the onset of entropy reduction from its maximal value.
- Explicit scaling laws are derived for the number of distinct rules, entropy, and related observables across scaling, saturation, and critical regimes.
The theory resolves previous ambiguities regarding the existence of a thermodynamic transition and explains the slow approach to the large-N limit as a consequence of log N dependence. It provides a unified framework in which universal statistical properties of language emerge from typical realizations of generative grammars.