论文标题
联合的随机改组,减轻和差异
Federated Random Reshuffling with Compression and Variance Reduction
论文作者
论文摘要
随机重组(RR)是一种随机梯度下降(SGD)的变体,采用无需替代的采样,是一种非常流行的方法,用于通过经验最小化训练监督机器学习模型。由于其出色的实践性能,它被嵌入,并且通常是标准机器学习软件中的默认设置。以FedRR的名义,最近显示该方法适用于联邦学习(Mishchenko等,2021),与常见基线(如本地SGD)相比,性能卓越。受这一开发的启发,我们设计了三种新算法以进一步改善FedRR:压缩FedRR和两个差异减少了扩展:一种用于驯服来自改组的差异,另一个用于由于压缩而导致的差异。降低压缩的差异机制使我们能够消除对压缩参数的依赖,并通过Malinovsky等人(2021)介绍了其他受控的线性扰动以随机改组,以帮助以最佳的速度消除方差。我们在没有有限梯度假设和异质数据的标准假设下对压缩局部方法进行了首次分析,从而克服了压缩操作员的局限性。我们通过对合成和真实数据集的实验来证实我们的理论结果。
Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization. Due to its superior practical performance, it is embedded and often set as default in standard machine learning software. Under the name FedRR, this method was recently shown to be applicable to federated learning (Mishchenko et al.,2021), with superior performance when compared to common baselines such as Local SGD. Inspired by this development, we design three new algorithms to improve FedRR further: compressed FedRR and two variance reduced extensions: one for taming the variance coming from shuffling and the other for taming the variance due to compression. The variance reduction mechanism for compression allows us to eliminate dependence on the compression parameter, and applying additional controlled linear perturbations for Random Reshuffling, introduced by Malinovsky et al.(2021) helps to eliminate variance at the optimum. We provide the first analysis of compressed local methods under standard assumptions without bounded gradient assumptions and for heterogeneous data, overcoming the limitations of the compression operator. We corroborate our theoretical results with experiments on synthetic and real data sets.