在Armijo状况中学习率的渐近行为

论文标题

在Armijo状况中学习率的渐近行为

Asymptotic behaviour of learning rates in Armijo's condition

论文作者

Truong, Tuyen Trung, Nguyen, Tuan Hang

论文摘要

修复常数$ 0 <α<1 $。对于$ c^1 $函数$ f：\ mathbb {r}^k \ rightarrow \ mathbb {r} $，a $ x $和一个正数$ umugudy $δ> 0 $，我们说Armijo的条件是满足的，如果$ f（x-δ\ nabla f（x）f（x）f（x） - f（x） - f（x） - f（x）\ leq-leq-leq-leq-αΔ它是众所周知的回溯梯度下降（回溯GD）算法的基础。考虑一个由$ x_ {n+1}定义的序列$ \ {x_n \} $，= x_n-Δ_n\ nabla f（x_n）$，对于满足Armijo条件的正数$δ_n$。我们表明，如果$ \ {x_n \} $收敛到非分类关键点，则必须限制$ \ {Δ_n\} $。此外，可以根据Hessian $ \ nabla ^2f $的规范量量化这种界限及其在极限点的逆。这补充了第一作者在无界的回溯GD上的结果，并表明，如果收敛到非分类临界点，则无界回溯GD的行为与通常的回溯GD没有太大差异。另一方面，如果收敛到退化临界点，行为可能会大不相同。我们进行了一些实验，以说明两个情景确实可以发生。在本文的另一部分中，我们认为回溯GD具有正确的单位（根据Zeiler在Adadelta的论文中的定义）。要点是，由于回溯GD中的学习率受Armijo状况的约束，因此并非无单位。

Fix a constant $0<α<1$. For a $C^1$ function $f:\mathbb{R}^k\rightarrow \mathbb{R}$, a point $x$ and a positive number $δ>0$, we say that Armijo's condition is satisfied if $f(x-δ\nabla f(x))-f(x)\leq -αδ||\nabla f(x)||^2$. It is a basis for the well known Backtracking Gradient Descent (Backtracking GD) algorithm. Consider a sequence $\{x_n\}$ defined by $x_{n+1}=x_n-δ_n\nabla f(x_n)$, for positive numbers $δ_n$ for which Armijo's condition is satisfied. We show that if $\{x_n\}$ converges to a non-degenerate critical point, then $\{δ_n\}$ must be bounded. Moreover this boundedness can be quantified in terms of the norms of the Hessian $\nabla ^2f$ and its inverse at the limit point. This complements the first author's results on Unbounded Backtracking GD, and shows that in case of convergence to a non-degenerate critical point the behaviour of Unbounded Backtracking GD is not too different from that of usual Backtracking GD. On the other hand, in case of convergence to a degenerate critical point the behaviours can be very much different. We run some experiments to illustrate that both scenrios can really happen. In another part of the paper, we argue that Backtracking GD has the correct unit (according to a definition by Zeiler in his Adadelta's paper). The main point is that since learning rate in Backtracking GD is bound by Armijo's condition, it is not unitless.

下载PDF全文

下载文献需遵守相关版权规定

论文标题