Hyperball is a simple optimizer wrapper that sets fixed Frobenius norms for weight matrices and their updates. It improves training speed and learning rate transfer in large models, achieving 20--30% token equivalent speedup over weight decay baselines on up to 1.2B parameter models.
Hyperball Optimization for Faster Language Model Training
from English