论文标题

Nesterov加速度的梯度规范最小化:$ O(1/k^3)$

Gradient Norm Minimization of Nesterov Acceleration: $o(1/k^3)$

论文作者

Chen, Shuo, Shi, Bin, Yuan, Ya-xiang

论文摘要

在一阶算法的历史中,Nesterov的加速梯度下降(NAG)是里程碑之一。但是,长期以来,加速的原因一直是一个谜。直到[Shi等人,2021年]提出的高分辨率微分方程框架之前,它尚未得到梯度校正的存在。在本文中,我们继续研究加速现象。首先,我们基于精确的观察和$ l $ smorth功能的不等式提供了明显的简化证明。然后,提出了一个新的隐式高分辨率高分辨率微分方程框架,以及相应的隐式 - 速度版本的相位空间表示和lyapunov函数,以研究迭代序列$ \ {x_k \} _ {x_k \} _ {k = 0}^{k = 0}^{\ infty} $ nag的收敛行为。此外,从两种相位空间表示形式中,我们发现,梯度校正所起的作用等同于按速度隐含在梯度中包含的,其中唯一的区别来自迭代序列$ \ \ {y_ {k} {k} {k} _ {k} _ { $ \ {x_k \} _ {k = 0}^{\ infty} $。最后,对于NAG梯度规范最小化是否具有更快的速率$ O(1/k^3)$的开放性问题,我们为其证明找出了一个积极的答案。同时,为$ r> 2 $显示,显示更快的客观价值最小化$ O(1/k^2)$。

In the history of first-order algorithms, Nesterov's accelerated gradient descent (NAG) is one of the milestones. However, the cause of the acceleration has been a mystery for a long time. It has not been revealed with the existence of gradient correction until the high-resolution differential equation framework proposed in [Shi et al., 2021]. In this paper, we continue to investigate the acceleration phenomenon. First, we provide a significantly simplified proof based on precise observation and a tighter inequality for $L$-smooth functions. Then, a new implicit-velocity high-resolution differential equation framework, as well as the corresponding implicit-velocity version of phase-space representation and Lyapunov function, is proposed to investigate the convergence behavior of the iterative sequence $\{x_k\}_{k=0}^{\infty}$ of NAG. Furthermore, from two kinds of phase-space representations, we find that the role played by gradient correction is equivalent to that by velocity included implicitly in the gradient, where the only difference comes from the iterative sequence $\{y_{k}\}_{k=0}^{\infty}$ replaced by $\{x_k\}_{k=0}^{\infty}$. Finally, for the open question of whether the gradient norm minimization of NAG has a faster rate $o(1/k^3)$, we figure out a positive answer with its proof. Meanwhile, a faster rate of objective value minimization $o(1/k^2)$ is shown for the case $r > 2$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源