Muown Implicitly Performs Angular Step-size Decay

The article demonstrates that Muown's directional update is equivalent to a Riemannian step on normalized directions, where the un-normalized parameterization magnitude modulates the angular step size. This insight explains Muown's step-size stability and motivates the development of AngularMuown, which optimizes directly over normalized directions with an explicit, schedulable angular multiplier.

AngularMuown decouples the angular multiplier from the radial magnitude update to optimize directly over normalized directions.
The method improves upon Muown performance and leads the per-optimizer category in the modded nanoGPT speedrunning competition.
Experiments on Qwen2-0.5B and 1.1B parameter mixture-of-experts models confirm the algorithm scales beyond small models.

AngularMuown provides a more explicit control over angular step sizes, offering improved optimization stability and performance for pre-training Transformers.