用于分散联合学习的网络梯度下降算法

论文标题

用于分散联合学习的网络梯度下降算法

Network Gradient Descent Algorithm for Decentralized Federated Learning

论文作者

Wu, Shuyuan, Huang, Danyang, Wang, Hansheng

论文摘要

我们研究了一种完全分散的联合学习算法，该算法是一种在基于通信的网络上执行的新型梯度下降算法。为了方便起见，我们将其称为网络梯度下降（NGD）方法。在NGD方法中，仅需要传达统计数据（例如参数估计），从而最大程度地降低隐私风险。同时，不同的客户根据精心设计的网络结构直接互相通信，而没有中央大师。这大大提高了整个算法的可靠性。这些不错的特性激发了我们在理论和数字上仔细研究NGD方法。从理论上讲，我们从经典的线性回归模型开始。我们发现，学习率和网络结构在确定NGD估计器的统计效率方面起着重要作用。所得的NGD估计器在统计学上可以与全局估计器一样有效，如果学习率足够小，并且网络结构平衡，即使数据是异质分布的，也可以很好地平衡。然后将这些有趣的发现扩展到一般模型和损失功能。进行了广泛的数值研究以证实我们的理论发现。还出于插图目的提出了经典的深度学习模型。

We study a fully decentralized federated learning algorithm, which is a novel gradient descent algorithm executed on a communication-based network. For convenience, we refer to it as a network gradient descent (NGD) method. In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy. Meanwhile, different clients communicate with each other directly according to a carefully designed network structure without a central master. This greatly enhances the reliability of the entire algorithm. Those nice properties inspire us to carefully study the NGD method both theoretically and numerically. Theoretically, we start with a classical linear regression model. We find that both the learning rate and the network structure play significant roles in determining the NGD estimator's statistical efficiency. The resulting NGD estimator can be statistically as efficient as the global estimator, if the learning rate is sufficiently small and the network structure is well balanced, even if the data are distributed heterogeneously. Those interesting findings are then extended to general models and loss functions. Extensive numerical studies are presented to corroborate our theoretical findings. Classical deep learning models are also presented for illustration purpose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题