论文标题
降低差异的随机加速双重算法
A Variance-Reduced Stochastic Accelerated Primal Dual Algorithm
论文作者
论文摘要
在这项工作中,我们强烈考虑强烈凸出(SCSC)鞍点(SP)问题$ \ min_ {x \ in \ mathbb {r}^{d_x}} \ max_ {y \ max_ {y \ in \ mathbb {r}每$ y $的$μ$ - strongly凸,$ f(x,。)$是$μ$ - 每$ x $的浓度。在强大的经验风险最小化(ERM)的背景下,在机器学习中经常出现此类问题,例如$ \ textIt {分布稳健} $ erm,其中使用小批量数据点估算部分梯度。假设我们可以访问无偏的随机一阶甲骨文,我们考虑了最近在Zhang等人中引入的随机加速双重(SAPD)算法。 [2021]对于SCSC SP问题,作为针对梯度噪声的强大方法。特别是,当动量参数设置为零时,当动量参数适当调谐时,即$κ\ triangleq l/μ$依赖$κ^2 $κ^$κ$κ$κ$κ。我们提出了基于Richardson-Romberg外推的SAPD的有效降低差异策略,并表明我们的方法在实践和理论上都对SAPD进行了改进。
In this work, we consider strongly convex strongly concave (SCSC) saddle point (SP) problems $\min_{x\in\mathbb{R}^{d_x}}\max_{y\in\mathbb{R}^{d_y}}f(x,y)$ where $f$ is $L$-smooth, $f(.,y)$ is $μ$-strongly convex for every $y$, and $f(x,.)$ is $μ$-strongly concave for every $x$. Such problems arise frequently in machine learning in the context of robust empirical risk minimization (ERM), e.g. $\textit{distributionally robust}$ ERM, where partial gradients are estimated using mini-batches of data points. Assuming we have access to an unbiased stochastic first-order oracle we consider the stochastic accelerated primal dual (SAPD) algorithm recently introduced in Zhang et al. [2021] for SCSC SP problems as a robust method against gradient noise. In particular, SAPD recovers the well-known stochastic gradient descent ascent (SGDA) as a special case when the momentum parameter is set to zero and can achieve an accelerated rate when the momentum parameter is properly tuned, i.e., improving the $κ\triangleq L/μ$ dependence from $κ^2$ for SGDA to $κ$. We propose efficient variance-reduction strategies for SAPD based on Richardson-Romberg extrapolation and show that our method improves upon SAPD both in practice and in theory.