UFP4: Uniform 4-Bit Training Overcomes Shrinkage Bias in LLM Pretraining
A study identifies shrinkage bias in E2M1-based FP4 formats due to geometric asymmetry, causing multiplicative error accumulation and training instability. The proposed UFP4 recipe uses uniform E1M2/INT4 grids and applies Random Hadamard Transform to all GEMMs, achieving lower loss degradation than E2M1 baselines in large-scale LLM pretraining. The authors recommend E1M2/INT4 as a first-class training primitive for future accelerators.