论文标题
训练量表不变的神经网络可以在三个政权中发生
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
论文作者
论文摘要
深度学习归一化技术的基本特性,例如批准归一化,使得划分参数量表不变。此类参数的固有域是单位球体,因此可以通过球形优化的梯度优化动力学以不同的有效学习率(ELR)来表示,该动力学先前已进行了研究。但是,不同的ELR可能会掩盖固有损失景观结构的某些特征。在这项工作中,我们使用固定的ELR直接研究了训练量表不变的神经网络的特性。我们根据ELR值发现了这种训练的三个方案:收敛,混乱平衡和分歧。我们详细研究了对玩具示例的理论检查以及对真实规模不变深度学习模型的彻底经验分析。每个政权都有独特的特征,并反映了内在损失格局的特定特性,其中一些与先前对常规和不变性神经网络培训的研究相似。最后,我们证明了如何在归一化网络的常规培训以及如何利用它们以实现更好的Optima中反映发现的制度。
A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR may obscure certain characteristics of the intrinsic loss landscape structure. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.