非convex随机分散优化的原始二算法的理论分析

论文标题

非convex随机分散优化的原始二算法的理论分析

Theoretical Analysis of Primal-Dual Algorithm for Non-Convex Stochastic Decentralized Optimization

论文作者

Takezawa, Yuki, Niwa, Kenta, Yamada, Makoto

论文摘要

近年来，分散的学习不仅是大规模机器学习的强大工具，而且还用于保护隐私。分散学习的主要挑战之一是，每个节点持有的数据分布在统计上是异质的。为了应对这一挑战，提出了一种称为边缘传感器学习（ECL）的原始二重性算法，并在实验上证明对数据分布的异质性是可靠的。但是，仅当目标函数是凸的时，才提供ECL的收敛速率，并且在目标函数为非convex的标准机器学习设置中尚未显示。此外，尚未研究ECL对数据分布的异质性鲁棒性的直观原因。在这项工作中，我们首先研究ECL和八卦算法之间的关系，并表明ECL的更新公式可以被视为纠正八卦算法中的局部随机梯度。然后，我们提出了包含ECL作为特殊情况的广义ECL（G-ECL），并提供G-ECL的收敛速率（强烈）凸和非凸面设置，这不取决于数据分布的异质性。通过合成实验，我们证明了G-ECL和ECL的数值结果与G-ECL的收敛速率一致。

In recent years, decentralized learning has emerged as a powerful tool not only for large-scale machine learning, but also for preserving privacy. One of the key challenges in decentralized learning is that the data distribution held by each node is statistically heterogeneous. To address this challenge, the primal-dual algorithm called the Edge-Consensus Learning (ECL) was proposed and was experimentally shown to be robust to the heterogeneity of data distributions. However, the convergence rate of the ECL is provided only when the objective function is convex, and has not been shown in a standard machine learning setting where the objective function is non-convex. Furthermore, the intuitive reason why the ECL is robust to the heterogeneity of data distributions has not been investigated. In this work, we first investigate the relationship between the ECL and Gossip algorithm and show that the update formulas of the ECL can be regarded as correcting the local stochastic gradient in the Gossip algorithm. Then, we propose the Generalized ECL (G-ECL), which contains the ECL as a special case, and provide the convergence rates of the G-ECL in both (strongly) convex and non-convex settings, which do not depend on the heterogeneity of data distributions. Through synthetic experiments, we demonstrate that the numerical results of both the G-ECL and ECL coincide with the convergence rate of the G-ECL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题