This paper analyzes discontinuities in Sparse Mixture-of-Experts models, classifying them by order and showing that lower-order discontinuities dominate in volume. It proves that random input paths almost surely first hit an order-1 discontinuity with finite-time probability bounds and derives occupation-time bounds for each order. A simple smoothing mechanism is proposed that enhances model continuity and performance with minimal computational overhead.
Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts
from English