通过传统优化分析来调和现代深度学习：内在学习率

论文标题

通过传统优化分析来调和现代深度学习：内在学习率

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

论文作者

Li, Zhiyuan, Lyu, Kaifeng, Arora, Sanjeev

论文摘要

最近的作品（例如（Li和Arora，2020年））表明，在当今的深度学习中使用流行的标准化方案（包括批归归式化）可以使其远离传统的优化观点，例如，使用指数增长的学习率。当前的论文强调了标准化网的行为与传统观点不同的其他方式，然后通过适当适应传统框架来研究其数学的正式框架，即，通过合适的随机差分方程（SDE）对SGD诱导的训练轨迹进行建模，并捕获噪声术语，从而捕获径向噪声。这产生了：（a）一种新的“内在学习率”参数，它是正常学习率和体重衰减因子的产物。对SDE的分析显示，在内在LR的控制下，有效学习速度如何变化并随着时间的流逝而变化。（b）通过理论和实验的一个挑战 - 普遍认为，良好的概括需要在培训开始时进行大量学习率。（c）以数学直觉的支持，新实验表明，平衡步骤（在功能空间中）作为内在学习率的逆，而不是由SDE分析暗示的指数时间收敛。我们将其命名为快速平衡的猜想，并建议它是批准归一化为有效的关键。

Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e.g., use of exponentially increasing learning rates. The current paper highlights other ways in which behavior of normalized nets departs from traditional viewpoints, and then initiates a formal framework for studying their mathematics via suitable adaptation of the conventional framework namely, modeling SGD-induced training trajectory via a suitable stochastic differential equation (SDE) with a noise term that captures gradient noise. This yields: (a) A new ' intrinsic learning rate' parameter that is the product of the normal learning rate and weight decay factor. Analysis of the SDE shows how the effective speed of learning varies and equilibrates over time under the control of intrinsic LR. (b) A challenge -- via theory and experiments -- to popular belief that good generalization requires large learning rates at the start of training. (c) New experiments, backed by mathematical intuition, suggesting the number of steps to equilibrium (in function space) scales as the inverse of the intrinsic learning rate, as opposed to the exponential time convergence bound implied by SDE analysis. We name it the Fast Equilibrium Conjecture and suggest it holds the key to why Batch Normalization is effective.

下载PDF全文

下载文献需遵守相关版权规定

论文标题