关于在异质数据上进行分布式优化的通信压缩

论文标题

关于在异质数据上进行分布式优化的通信压缩

On Communication Compression for Distributed Optimization on Heterogeneous Data

论文作者

Stich, Sebastian U.

论文摘要

具有公正或有偏置压缩机的有损梯度压缩已成为避免在机器学习模型中心协调的分布式培训中进行通信瓶颈的关键工具。我们分析了两种标准和一般类型的方法的性能：（i）具有任意无偏量化器的分布式量化量化的SGD（D-QSGD），以及（ii）在异构（非IIID）数据设置中具有错误反馈和偏置压缩机（D-EF-SGD）的分布式SGD。我们的结果表明，通过非IID数据，D-ef-SGD的影响远低于D-QSGD，但是如果数据稳定度很高，则两种方法都可能遭受放缓。我们进一步研究了两种不受异源数据分布影响（或更少）影响的替代方法：首先，一种最近提出的方法，可有效地强烈凸出问题，其次，我们指出了一种更通用的方法，该方法仅适用于线性压缩机，仅适用于所有考虑的情况，但有效。

Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models. We analyze the performance of two standard and general types of methods: (i) distributed quantized SGD (D-QSGD) with arbitrary unbiased quantizers and (ii) distributed SGD with error-feedback and biased compressors (D-EF-SGD) in the heterogeneous (non-iid) data setting. Our results indicate that D-EF-SGD is much less affected than D-QSGD by non-iid data, but both methods can suffer a slowdown if data-skewness is high. We further study two alternatives that are not (or much less) affected by heterogenous data distributions: first, a recently proposed method that is effective on strongly convex problems, and secondly, we point out a more general approach that is applicable to linear compressors only but effective in all considered scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题