论文标题

随机稀疏的分布式SGD的独立错误反馈

Detached Error Feedback for Distributed SGD with Random Sparsification

论文作者

Xu, An, Huang, Heng

论文摘要

在大规模分布的深度学习中,沟通瓶颈一直是一个关键问题。在这项工作中,我们研究了分布式的SGD,以随机的块状稀疏为梯度压缩机,该梯度压缩机是兼容的环形压缩机,兼容且高度计算效率,但导致性能较低。为了解决这一重要问题,我们从新颖方面(即梯度的差异与第二时刻之间的权衡)提高了沟通效率的分布式SGD。通过这种动机,我们提出了一种新的分离错误反馈(DEF)算法,该算法比非凸问题的错误反馈显示更好的收敛绑定。我们还建议DEFA在训练的早期阶段加速DEF的概括,该训练的概括范围比DEF更好。此外,我们首次与迭代平均(SGD-IA)建立了通信有效的分布式SGD和SGD之间的联系。广泛的深度学习实验表明,在各种环境下所提出的方法的显着经验改进。

The communication bottleneck has been a critical problem in large-scale distributed deep learning. In this work, we study distributed SGD with random block-wise sparsification as the gradient compressor, which is ring-allreduce compatible and highly computation-efficient but leads to inferior performance. To tackle this important issue, we improve the communication-efficient distributed SGD from a novel aspect, that is, the trade-off between the variance and second moment of the gradient. With this motivation, we propose a new detached error feedback (DEF) algorithm, which shows better convergence bound than error feedback for non-convex problems. We also propose DEF-A to accelerate the generalization of DEF at the early stages of the training, which shows better generalization bounds than DEF. Furthermore, we establish the connection between communication-efficient distributed SGD and SGD with iterate averaging (SGD-IA) for the first time. Extensive deep learning experiments show significant empirical improvement of the proposed methods under various settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源