使用新型的非线性聚合功能来实现有效的分布式机器学习

论文标题

使用新型的非线性聚合功能来实现有效的分布式机器学习

Achieving Efficient Distributed Machine Learning Using a Novel Non-Linear Class of Aggregation Functions

论文作者

Du, Haizhou, Yang, Ryan, Chen, Yijian, Xiang, Qiao, Wibisono, Andre, Huang, Wei

论文摘要

分布式机器学习（DML）在随着时间的变化网络上可以成为新兴的分散ML应用程序（例如自动驾驶和无人机短暂的）的推动力。但是，现有DML系统中常用的加权算术平均模型聚集函数可能会导致高模型丢失，低模型准确性和越来越多的收敛速度。为了解决这个问题，在本文中，我们提出了一种新型的非线性模型聚合函数，以实现有效的DML，而不是随着时间变化的网络。我们的机制并没有像大多数现有研究那样对相邻模型进行线性聚合，而是使用非线性聚合，加权电源-P平均值（WPM）作为邻居的局部模型的聚合函数。随后的优化步骤是使用由Bregman Divergence定义的镜下下降进行的，该差异保持了融合到最佳性。在本文中，我们分析了WPM的性质，并严格地证明了聚合机制的收敛性。此外，通过广泛的实验，我们表明，当P> 1时，我们的设计显着提高了模型的收敛速度，并且与算术平均聚集函数相比，在随时间变化网络下的DML的可扩展性，几乎没有其他计算的计算在头顶上。

Distributed machine learning (DML) over time-varying networks can be an enabler for emerging decentralized ML applications such as autonomous driving and drone fleeting. However, the commonly used weighted arithmetic mean model aggregation function in existing DML systems can result in high model loss, low model accuracy, and slow convergence speed over time-varying networks. To address this issue, in this paper, we propose a novel non-linear class of model aggregation functions to achieve efficient DML over time-varying networks. Instead of taking a linear aggregation of neighboring models as most existing studies do, our mechanism uses a nonlinear aggregation, a weighted power-p mean (WPM), as the aggregation function of local models from neighbors. The subsequent optimizing steps are taken using mirror descent defined by a Bregman divergence that maintains convergence to optimality. In this paper, we analyze properties of the WPM and rigorously prove convergence properties of our aggregation mechanism. Additionally, through extensive experiments, we show that when p > 1, our design significantly improves the convergence speed of the model and the scalability of DML under time-varying networks compared with arithmetic mean aggregation functions, with little additional computation overhead.

下载PDF全文

下载文献需遵守相关版权规定

论文标题