Tapered Language Models Improve Performance

Tapered Language Models (TLMs) allocate more parameters to earlier layers and fewer to later ones, reducing perplexity and boosting benchmark performance across architectures. This depth-aware capacity allocation improves language model outputs without adding compute or parameters, offering a simple, universal design principle.