分布式平均估计和差异的新界限

论文标题

分布式平均估计和差异的新界限

New Bounds For Distributed Mean Estimation and Variance Reduction

论文作者

Davies, Peter, Gurunathan, Vijaykrishna, Moshrefi, Niusha, Ashkboos, Saleh, Alistarh, Dan

论文摘要

我们考虑分布式均值估计的问题（DME），其中$ n $机器分别给出了本地$ d $二维矢量$ x_v \ in \ mathbb {r}^d $，并且必须配合以估算其输入的平均值$μ= = = \ frac 1n \ sum_ \ sum_ \ sum_ {v = 1} n n x_v $，同时又分钟^n x_v $ nimigion。 DME是分布式机器学习中的基本结构，并且在此问题的变体上进行了大量工作，尤其是在平行SGD中随机梯度的分布式差异的背景下。以前的工作通常在输入向量的规范上假定上限，并在此规范方面达到了错误。但是，在许多实际应用中，输入向量集中在正确的输出$μ$周围，但是$μ$本身具有较大的规范。在这种情况下，先前的输出误差界限的性能较差。在本文中，我们表明输出误差范围不必取决于输入规范。我们提供了一种量化方法，该方法允许仅取决于输入之间的距离，而不依赖于输入规范的距离进行分布式平均值估计，并显示出分布式方差降低的类似结果。该技术基于与晶格理论的新联系。我们还提供了下限，表明我们算法的错误权衡的通信在渐近上是最佳的。由于晶格在$ \ ell_2 $ -norm下达到最佳界限可能是计算不切实际的，因此我们还提供了一个延长范围，该扩展名利用了易于使用的立方晶格，并且仅在$ d $中放松到一个对数因子。我们通过实验表明，相对于先前的方法，我们的方法可以为常见应用提供实际改进。

We consider the problem of distributed mean estimation (DME), in which $n$ machines are each given a local $d$-dimensional vector $x_v \in \mathbb{R}^d$, and must cooperate to estimate the mean of their inputs $μ= \frac 1n\sum_{v = 1}^n x_v$, while minimizing total communication cost. DME is a fundamental construct in distributed machine learning, and there has been considerable work on variants of this problem, especially in the context of distributed variance reduction for stochastic gradients in parallel SGD. Previous work typically assumes an upper bound on the norm of the input vectors, and achieves an error bound in terms of this norm. However, in many real applications, the input vectors are concentrated around the correct output $μ$, but $μ$ itself has large norm. In such cases, previous output error bounds perform poorly. In this paper, we show that output error bounds need not depend on input norm. We provide a method of quantization which allows distributed mean estimation to be performed with solution quality dependent only on the distance between inputs, not on input norm, and show an analogous result for distributed variance reduction. The technique is based on a new connection with lattice theory. We also provide lower bounds showing that the communication to error trade-off of our algorithms is asymptotically optimal. As the lattices achieving optimal bounds under $\ell_2$-norm can be computationally impractical, we also present an extension which leverages easy-to-use cubic lattices, and is loose only up to a logarithmic factor in $d$. We show experimentally that our method yields practical improvements for common applications, relative to prior approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题