论文标题
随机二阶方法改善了梯度为主导函数的最著名的SGD样品复杂性
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function
论文作者
论文摘要
我们研究了随机立方体化牛顿(SCRN)在满足梯度优势性能的一类功能上的性能,其$ 1 \ le2 \ le2 $在机器学习和信号处理中都有广泛的应用。此条件确保任何一阶固定点都是全球最佳选择。我们证明,在实现$ε$ -Global最佳中,SCRN的总样本复杂性为$ \ Mathcal {o}(ε^{ - 7/(2α)+1})$,对于$ 1 \ leleα<3/2 $和$ \ $ \ tilde { $ 3/2 \leα\ le 2 $。 SCRN改善了最著名的随机梯度下降样品复杂性。即使在适用于基于政策的强化学习(RL)的梯度优势属性的弱版本下,SCRN也对随机策略梯度方法进行了相同的改进。此外,我们表明,使用差减少方法,$α= 1 $,可以将SCRN的平均样本复杂性简化为$ {\ Mathcal {o}}(ε^{ - 2})$,其中$α= 1 $。在各种RL设置中的实验结果展示了与一阶方法相比,SCRN的出色性能。
We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with $1\leα\le2$ which holds in a wide range of applications in machine learning and signal processing. This condition ensures that any first-order stationary point is a global optimum. We prove that the total sample complexity of SCRN in achieving $ε$-global optimum is $\mathcal{O}(ε^{-7/(2α)+1})$ for $1\leα< 3/2$ and $\mathcal{\tilde{O}}(ε^{-2/(α)})$ for $3/2\leα\le 2$. SCRN improves the best-known sample complexity of stochastic gradient descent. Even under a weak version of gradient dominance property, which is applicable to policy-based reinforcement learning (RL), SCRN achieves the same improvement over stochastic policy gradient methods. Additionally, we show that the average sample complexity of SCRN can be reduced to ${\mathcal{O}}(ε^{-2})$ for $α=1$ using a variance reduction method with time-varying batch sizes. Experimental results in various RL settings showcase the remarkable performance of SCRN compared to first-order methods.