VRA-FedSGD: Variance-Reduced Federated Learning for Heavy-Tailed Noise
The authors propose VRA-FedSGD, a variance-reduction based algorithm designed for federated learning in environments with heavy-tailed gradient and communication noise. This approach addresses challenges prevalent in large-scale machine learning over wireless networks and Internet of Things deployments. The method employs momentum variance reduction combined with nonlinear mapping to mitigate heavy-tailed gradient noise. It also utilizes a variance-reduced aggregation mechanism to suppress heavy-tailed communication noise. For nonconvex objective functions, VRA-FedSGD achieves a mean convergence rate of O(K^(-(p-1)/(2p-1))), where p is the tail index. In the almost sure sense, it reaches a rate of Õ(K^(-(1-1/(p-ε))) for strongly convex objectives, with ε being an arbitrarily small constant. Simulated experiments on logistic regression with real-world data verify the algorithm's effectiveness.