自适应加速（超）梯度方法降低方差

论文标题

自适应加速（超）梯度方法降低方差

Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

论文作者

Liu, Zijian, Nguyen, Ta Duy, Ene, Alina, Nguyen, Huy L.

论文摘要

在本文中，我们研究了针对一般凸情况的有限和凸优化问题。最近，对差异降低（VR）方法及其加速变体的研究取得了令人兴奋的进步。但是，现有VR算法中使用的步长通常取决于平滑度参数，这通常是未知的，需要在实践中进行调整。为了解决这个问题，我们提出了两种新型的自适应VR算法：自适应差异降低了加速毕是否（ADAVRAE）和自适应方差降低了加速梯度（ADAVRAG）。我们的算法不需要了解平滑度参数。 Adavrae使用$ \ MATHCAL {o} \ left（n \ log \ log n+\ sqrt {\ frac {nβ} {nβ}ε} \ right）$渐变评估，adavrag使用$ \ mathcal {o \ nathcal {o} n+\ sqrt {\ frac {nβ\logβ}ε} \ right）$梯度评估以达到$ \ mathcal {o}（ε）$ - 次优求解解决方案，其中$ n $是有限和$β$的函数数量，$β$是光滑度参数。该结果与非自适应VR方法的最著名收敛速率相匹配，并且在最先进的自适应VR方法ADASVRG的收敛状态下改善。与现实世界数据集的实验中的先前方法相比，我们证明了算法的出色性能。

In this paper, we study the finite-sum convex optimization problem focusing on the general convex case. Recently, the study of variance reduced (VR) methods and their accelerated variants has made exciting progress. However, the step size used in the existing VR algorithms typically depends on the smoothness parameter, which is often unknown and requires tuning in practice. To address this problem, we propose two novel adaptive VR algorithms: Adaptive Variance Reduced Accelerated Extra-Gradient (AdaVRAE) and Adaptive Variance Reduced Accelerated Gradient (AdaVRAG). Our algorithms do not require knowledge of the smoothness parameter. AdaVRAE uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{nβ}ε}\right)$ gradient evaluations and AdaVRAG uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{nβ\logβ}ε}\right)$ gradient evaluations to attain an $\mathcal{O}(ε)$-suboptimal solution, where $n$ is the number of functions in the finite sum and $β$ is the smoothness parameter. This result matches the best-known convergence rate of non-adaptive VR methods and it improves upon the convergence of the state of the art adaptive VR method, AdaSVRG. We demonstrate the superior performance of our algorithms compared with previous methods in experiments on real-world datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题