随机梯度下降：统一理论和新的有效方法

论文标题

随机梯度下降：统一理论和新的有效方法

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods

论文作者

Beznosikov, Aleksandr, Gorbunov, Eduard, Berard, Hugo, Loizou, Nicolas

论文摘要

随机梯度下降（SGDA）是解决各种机器学习任务中出现的最突出的算法之一（VIP）。该方法的成功导致了经典SGDA的几个高级扩展，包括具有任意采样的变体，降低方差，坐标随机化和具有压缩的分布式变体，这些变体在文献中进行了广泛研究，尤其是在过去的几年中。在本文中，我们提出了一个统一的收敛分析，涵盖了各种随机梯度下降的方法，到目前为止，这些方法需要不同的直觉，具有不同的应用，并且已在各个社区中分别开发。我们统一框架的关键是随机估计值的参数假设。通过我们的一般理论框架，我们要么恢复已知特殊情况的最清晰的已知价格，要么拧紧它们。此外，为了说明我们方法的灵活性，我们开发了几种新的SGDA变体，例如新方差降低方法（L-SVRGDA），具有压缩的新分布式方法（QSGDA，Diana-Sgda，vr-diana-sgda），以及带有带有新方法的新方法，具有与坐标随机化（SEGA-SGDA）。尽管这些新方法的变体以解决最小化问题而闻名，但从未考虑过或分析用于解决最小值的问题和VIP。我们还通过广泛的数值实验证明了新方法的最重要属性。

Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. The success of the method led to several advanced extensions of the classical SGDA, including variants with arbitrary sampling, variance reduction, coordinate randomization, and distributed variants with compression, which were extensively studied in the literature, especially during the last few years. In this paper, we propose a unified convergence analysis that covers a large variety of stochastic gradient descent-ascent methods, which so far have required different intuitions, have different applications and have been developed separately in various communities. A key to our unified framework is a parametric assumption on the stochastic estimates. Via our general theoretical framework, we either recover the sharpest known rates for the known special cases or tighten them. Moreover, to illustrate the flexibility of our approach we develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA). Although variants of the new methods are known for solving minimization problems, they were never considered or analyzed for solving min-max problems and VIPs. We also demonstrate the most important properties of the new methods through extensive numerical experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题