论文标题
关于$ \ r^{d} $在Adagrad(Norm)的收敛性:超越凸,非反应率和加速度
On the Convergence of AdaGrad(Norm) on $\R^{d}$: Beyond Convexity, Non-Asymptotic Rate and Acceleration
论文作者
论文摘要
现有的用于平滑凸优化的Adagrad和其他自适应方法的分析通常用于具有有限域直径的功能。在不受约束的问题中,以前的作品保证了无明确的常数因子的渐近收敛速率,而整个功能类别则是正确的。此外,在随机设置中,只有一个修改的Adagrad版本与实践中常用的版本不同,在实践中使用的版本(未使用最新梯度来更新步骤尺寸)已进行了分析。我们的论文旨在弥合这些差距,并在平滑凸功能的标准设置以及类星体凸功能的更一般设置中对Adagrad及其变体有更深入的了解。首先,我们演示了新技术,以明确地限制香草·阿达格拉德(Vanilla Adagrad)的收敛速率,以解决确定性和随机设置中不受约束的问题。其次,我们提出了Adagrad的一种变体,我们可以显示最后迭代的收敛性,而不是平均迭代。最后,我们提供了新的加速自适应算法及其在确定性设置中的收敛保证,并明确依赖问题参数,从而改善了先前工作中所示的渐近率。
Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an explicit constant factor that holds true for the entire function class. Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems in both deterministic and stochastic settings. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate. Finally, we give new accelerated adaptive algorithms and their convergence guarantee in the deterministic setting with explicit dependency on the problem parameters, improving upon the asymptotic rate shown in previous works.