Smooth Scaling Laws Hide Stepwise Token Learning
This study presents a token-level framework that decomposes language model scaling laws into localized learning events of individual contextualized tokens, challenging the view that heavy-tailed pattern difficulty is the sole cause.