A study uses multiplicative ergodic theory to analyze exploding and vanishing gradients in deep neural networks. It shows that residual connections affect the Liapunov spectrum, as characterized by Furstenberg and Kifer, thereby stabilizing gradient flow during training.
Residual Connections Mitigate Gradient Issues in Deep Networks
from English