随机二阶方法改善了梯度为主导函数的最著名的SGD样品复杂性

论文标题

随机二阶方法改善了梯度为主导函数的最著名的SGD样品复杂性

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function

论文作者

Masiha, Saeed, Salehkaleybar, Saber, He, Niao, Kiyavash, Negar, Thiran, Patrick

论文摘要

我们研究了随机立方体化牛顿（SCRN）在满足梯度优势性能的一类功能上的性能，其$ 1 \ le2 \ le2 $在机器学习和信号处理中都有广泛的应用。此条件确保任何一阶固定点都是全球最佳选择。我们证明，在实现$ε$ -Global最佳中，SCRN的总样本复杂性为$ \ Mathcal {o}（ε^{ - 7/（2α）+1}）$，对于$ 1 \ leleα<3/2 $和$ \ $ \ tilde { $ 3/2 \leα\ le 2 $。 SCRN改善了最著名的随机梯度下降样品复杂性。即使在适用于基于政策的强化学习（RL）的梯度优势属性的弱版本下，SCRN也对随机策略梯度方法进行了相同的改进。此外，我们表明，使用差减少方法，$α= 1 $，可以将SCRN的平均样本复杂性简化为$ {\ Mathcal {o}}（ε^{ - 2}）$，其中$α= 1 $。在各种RL设置中的实验结果展示了与一阶方法相比，SCRN的出色性能。

We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with $1\leα\le2$ which holds in a wide range of applications in machine learning and signal processing. This condition ensures that any first-order stationary point is a global optimum. We prove that the total sample complexity of SCRN in achieving $ε$-global optimum is $\mathcal{O}(ε^{-7/(2α)+1})$ for $1\leα< 3/2$ and $\mathcal{\tilde{O}}(ε^{-2/(α)})$ for $3/2\leα\le 2$. SCRN improves the best-known sample complexity of stochastic gradient descent. Even under a weak version of gradient dominance property, which is applicable to policy-based reinforcement learning (RL), SCRN achieves the same improvement over stochastic policy gradient methods. Additionally, we show that the average sample complexity of SCRN can be reduced to ${\mathcal{O}}(ε^{-2})$ for $α=1$ using a variance reduction method with time-varying batch sizes. Experimental results in various RL settings showcase the remarkable performance of SCRN compared to first-order methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题