论文标题
联合学习中的回收模型更新:梯度子空间是否低级别?
Recycling Model Updates in Federated Learning: Are Gradient Subspaces Low-Rank?
论文作者
论文摘要
在本文中,我们质疑在联合学习过程中通过分布式系统传播大量参数背后的理由。我们首先检查集中式模型训练中跨时期梯度(即梯度空间)跨越的子空间的等级特性,并观察到该梯度空间通常由少数领先的主要组件组成,这些主要组件占解释差异的绝大多数(95-99%)。在此激励的基础上,我们提出了“外观梯度乘数”(LBGM)算法,该算法利用了这种低级属性,以使联合学习的模型更新回合之间的梯度回收,从而减少了大参数的传输到单个标量量的汇总。我们通过分析表征LBGM的收敛行为,揭示了交流节省和模型性能之间权衡的性质。我们随后的实验结果表明,与在多个数据集和深度学习模型上的常规联合学习相比,通信间接费用的改善LBGM得到了改善。此外,我们表明LBGM是一种通用的插件算法,可以独立使用或堆叠在现有的稀疏技术上进行分布式模型培训。
In this paper, we question the rationale behind propagating large numbers of parameters through a distributed system during federated learning. We start by examining the rank characteristics of the subspace spanned by gradients across epochs (i.e., the gradient-space) in centralized model training, and observe that this gradient-space often consists of a few leading principal components accounting for an overwhelming majority (95-99%) of the explained variance. Motivated by this, we propose the "Look-back Gradient Multiplier" (LBGM) algorithm, which exploits this low-rank property to enable gradient recycling between model update rounds of federated learning, reducing transmissions of large parameters to single scalars for aggregation. We analytically characterize the convergence behavior of LBGM, revealing the nature of the trade-off between communication savings and model performance. Our subsequent experimental results demonstrate the improvement LBGM obtains in communication overhead compared to conventional federated learning on several datasets and deep learning models. Additionally, we show that LBGM is a general plug-and-play algorithm that can be used standalone or stacked on top of existing sparsification techniques for distributed model training.