降低差异的随机加速双重算法

论文标题

降低差异的随机加速双重算法

A Variance-Reduced Stochastic Accelerated Primal Dual Algorithm

论文作者

Can, Bugra, Gurbuzbalaban, Mert, Aybat, Necdet Serhat

论文摘要

在这项工作中，我们强烈考虑强烈凸出（SCSC）鞍点（SP）问题$ \ min_ {x \ in \ mathbb {r}^{d_x}} \ max_ {y \ max_ {y \ in \ mathbb {r}每$ y $的$μ$ - strongly凸，$ f（x，。）$是$μ$ - 每$ x $的浓度。在强大的经验风险最小化（ERM）的背景下，在机器学习中经常出现此类问题，例如$ \ textIt {分布稳健} $ erm，其中使用小批量数据点估算部分梯度。假设我们可以访问无偏的随机一阶甲骨文，我们考虑了最近在Zhang等人中引入的随机加速双重（SAPD）算法。 [2021]对于SCSC SP问题，作为针对梯度噪声的强大方法。特别是，当动量参数设置为零时，当动量参数适当调谐时，即$κ\ triangleq l/μ$依赖$κ^2 $κ^$κ$κ$κ$κ。我们提出了基于Richardson-Romberg外推的SAPD的有效降低差异策略，并表明我们的方法在实践和理论上都对SAPD进行了改进。

In this work, we consider strongly convex strongly concave (SCSC) saddle point (SP) problems $\min_{x\in\mathbb{R}^{d_x}}\max_{y\in\mathbb{R}^{d_y}}f(x,y)$ where $f$ is $L$-smooth, $f(.,y)$ is $μ$-strongly convex for every $y$, and $f(x,.)$ is $μ$-strongly concave for every $x$. Such problems arise frequently in machine learning in the context of robust empirical risk minimization (ERM), e.g. $\textit{distributionally robust}$ ERM, where partial gradients are estimated using mini-batches of data points. Assuming we have access to an unbiased stochastic first-order oracle we consider the stochastic accelerated primal dual (SAPD) algorithm recently introduced in Zhang et al. [2021] for SCSC SP problems as a robust method against gradient noise. In particular, SAPD recovers the well-known stochastic gradient descent ascent (SGDA) as a special case when the momentum parameter is set to zero and can achieve an accelerated rate when the momentum parameter is properly tuned, i.e., improving the $κ\triangleq L/μ$ dependence from $κ^2$ for SGDA to $κ$. We propose efficient variance-reduction strategies for SAPD based on Richardson-Romberg extrapolation and show that our method improves upon SAPD both in practice and in theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题