Conservation Laws for Modern Neural Architectures

This paper introduces a unified framework to identify conservation laws in gradient flow for modern neural architectures. It covers feedforward networks with GELU, SiLU, and SwiGLU activations, multihead attention with sinusoidal and rotary positional encodings, and Mixture-of-Experts models under various gating schemes. Experiments validate the predicted invariants, supporting the theoretical findings.