对动量的随机梯度下降的改进分析

论文标题

对动量的随机梯度下降的改进分析

An Improved Analysis of Stochastic Gradient Descent with Momentum

论文作者

Liu, Yanli, Gao, Yuan, Yin, Wotao

论文摘要

具有动量（SGDM）的SGD已被广泛应用于许多机器学习任务中，并且通常以动态的步骤和动量重量以舞台方式调节。尽管具有与SGD相比的经验优势，但动量的作用总体上仍不清楚，因为先前对SGDM的分析要么提供比SGD的融合范围更差，要么假设Lipschitz或二次目标，这些目标在实践中未能实现。此外，尚未解决动态参数的作用。在这项工作中，我们表明SGDM在强凸和非convex设置下同时获得了与SGD一样快的SGD收敛速度。我们还为多阶段设置建立了\ textIt {第一个}收敛保证，并证明与使用固定参数相比，多阶段策略对SGDM有益。最后，我们通过数值实验来验证这些理论主张。

SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise manner. Despite of its empirical advantage over SGD, the role of momentum is still unclear in general since previous analyses on SGDM either provide worse convergence bounds than those of SGD, or assume Lipschitz or quadratic objectives, which fail to hold in practice. Furthermore, the role of dynamic parameters has not been addressed. In this work, we show that SGDM converges as fast as SGD for smooth objectives under both strongly convex and nonconvex settings. We also establish \textit{the first} convergence guarantee for the multistage setting, and show that the multistage strategy is beneficial for SGDM compared to using fixed parameters. Finally, we verify these theoretical claims by numerical experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题