Mechanism-Driven Monitors for Preemptive Detection of LLM Training Instability

This article introduces mechanism-driven monitors designed to detect large language model training instability before it causes significant damage. By deriving internal signals from the functional roles of critical modules, these monitors identify failures thousands of steps earlier than traditional loss-based methods.

For low-precision flash attention, the method monitors the spectral entropy of a QK bilinear decomposition, which becomes abnormal before the loss fully collapses.
Indicators for Mixture-of-Experts (MoE) routers are derived from their specific role in expert selection.
Fault-injection experiments on low-precision attention, large learning rates, and combined faults demonstrate that these signals provide distinct signatures for different failure types.

These monitors enable the preemptive detection of numerical or hyperparameter faults while loss and gradient norms still appear normal, potentially saving massive computational resources.