随机稀疏的分布式SGD的独立错误反馈

论文标题

随机稀疏的分布式SGD的独立错误反馈

Detached Error Feedback for Distributed SGD with Random Sparsification

论文作者

Xu, An, Huang, Heng

论文摘要

在大规模分布的深度学习中，沟通瓶颈一直是一个关键问题。在这项工作中，我们研究了分布式的SGD，以随机的块状稀疏为梯度压缩机，该梯度压缩机是兼容的环形压缩机，兼容且高度计算效率，但导致性能较低。为了解决这一重要问题，我们从新颖方面（即梯度的差异与第二时刻之间的权衡）提高了沟通效率的分布式SGD。通过这种动机，我们提出了一种新的分离错误反馈（DEF）算法，该算法比非凸问题的错误反馈显示更好的收敛绑定。我们还建议DEFA在训练的早期阶段加速DEF的概括，该训练的概括范围比DEF更好。此外，我们首次与迭代平均（SGD-IA）建立了通信有效的分布式SGD和SGD之间的联系。广泛的深度学习实验表明，在各种环境下所提出的方法的显着经验改进。

The communication bottleneck has been a critical problem in large-scale distributed deep learning. In this work, we study distributed SGD with random block-wise sparsification as the gradient compressor, which is ring-allreduce compatible and highly computation-efficient but leads to inferior performance. To tackle this important issue, we improve the communication-efficient distributed SGD from a novel aspect, that is, the trade-off between the variance and second moment of the gradient. With this motivation, we propose a new detached error feedback (DEF) algorithm, which shows better convergence bound than error feedback for non-convex problems. We also propose DEF-A to accelerate the generalization of DEF at the early stages of the training, which shows better generalization bounds than DEF. Furthermore, we establish the connection between communication-efficient distributed SGD and SGD with iterate averaging (SGD-IA) for the first time. Extensive deep learning experiments show significant empirical improvement of the proposed methods under various settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题