缩放野外：分散霍格维尔！

论文标题

缩放野外：分散霍格维尔！

Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD

论文作者

Chatterjee, Bapi, Kungurtsev, Vyacheslav, Alistarh, Dan

论文摘要

霍格威尔德（Hogwilld）由无锁异步的简单性提供支持！是通过共享内存设置并行化SGD的首选方法。尽管它的受欢迎程度和伴随的扩展名（例如PASSM+）并发过程更新具有分区梯度的共享模型，但将其扩展到分散的工人，但令人惊讶地尚未探索。据我们所知，此类方法没有收敛理论，也没有评估加速的系统数值比较。在本文中，我们提出了一种算法，其中包含分散的分布式内存计算体系结构，每个节点运行多处理并行共享 - 内存SGD本身。我们的方案基于以下算法工具和功能：（a）对工人共享记忆，（b）部分反向传播的异步本地梯度更新，以及（c）本地模型的非块平均。我们证明，我们的方法可以保证非凸目标目标的千古融合率。在实际方面，我们表明该提出的方法在CIFAR-10，CIFAR-100和Imagenet数据集上表现出提高的标准图像分类基准的吞吐量和竞争精度。我们的代码可在https://github.com/bapi/lpp-sgd上找到。

Powered by the simplicity of lock-free asynchrony, Hogwilld! is a go-to approach to parallelize SGD over a shared-memory setting. Despite its popularity and concomitant extensions, such as PASSM+ wherein concurrent processes update a shared model with partitioned gradients, scaling it to decentralized workers has surprisingly been relatively unexplored. To our knowledge, there is no convergence theory of such methods, nor systematic numerical comparisons evaluating speed-up. In this paper, we propose an algorithm incorporating decentralized distributed memory computing architecture with each node running multiprocessing parallel shared-memory SGD itself. Our scheme is based on the following algorithmic tools and features: (a) asynchronous local gradient updates on the shared-memory of workers, (b) partial backpropagation, and (c) non-blocking in-place averaging of the local models. We prove that our method guarantees ergodic convergence rates for non-convex objectives. On the practical side, we show that the proposed method exhibits improved throughput and competitive accuracy for standard image classification benchmarks on the CIFAR-10, CIFAR-100, and Imagenet datasets. Our code is available at https://github.com/bapi/LPP-SGD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题