通过脱钩的自适应优化加速联盟学习

论文标题

通过脱钩的自适应优化加速联盟学习

Accelerated Federated Learning with Decoupled Adaptive Optimization

论文作者

Jin, Jiayin, Ren, Jiaxiang, Zhou, Yang, Lyu, Lingjuan, Liu, Ji, Dou, Dejing

论文摘要

联合学习（FL）框架使Edge客户能够协作学习共享的推理模型，同时保留对客户的培训数据的隐私。最近，已经采取了许多启发式方法来概括集中化的自适应优化方法，例如SGDM，Adam，Adagrad等，以提高收敛性和准确性。但是，关于在联合设置中的位置以及如何设计和利用自适应优化方法的理论原理仍然很少。这项工作旨在从普通微分方程（ODE）的动力学的角度开发新的自适应优化方法，以开发FL的新型自适应优化方法。首先，建立了一个分析框架，以在联合优化方法和相应集中式优化器的ODES分解之间建立连接。其次，基于这个分析框架，开发了一种动量解耦自适应优化方法FedDA，以充分利用每种本地迭代的全球动量，并加速训练收敛。最后但并非最不重要的一点是，在训练过程结束时，全批量梯度用于模仿集中式优化，以确保收敛并克服由自适应优化方法引起的可能的不一致。

The federated learning (FL) framework enables edge clients to collaboratively learn a shared inference model while keeping privacy of training data on clients. Recently, many heuristics efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings for improving convergence and accuracy. However, there is still a paucity of theoretical principles on where to and how to design and utilize adaptive optimization methods in federated settings. This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs). First, an analytic framework is established to build a connection between federated optimization methods and decompositions of ODEs of corresponding centralized optimizers. Second, based on this analytic framework, a momentum decoupling adaptive optimization method, FedDA, is developed to fully utilize the global momentum on each local iteration and accelerate the training convergence. Last but not least, full batch gradients are utilized to mimic centralized optimization in the end of the training process to ensure the convergence and overcome the possible inconsistency caused by adaptive optimization methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题