论文标题
浅单变量relu网络作为花样:初始化,损失表面,黑森和梯度流动动力学
Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, & Gradient Flow Dynamics
论文作者
论文摘要
理解神经网络(NNS)的学习动力学和归纳偏置受到NN参数与所代表的函数之间关系的不透明度的阻碍。我们提出将Relu NNS重新训练为连续的分段线性花键。使用这种样条晶状体,我们研究了浅单变量恢复中的学习动力学,发现了几种困惑现象的意外见解和解释。我们对损失表面结构的简单和透明的视图产生了令人惊讶的简单和透明的视野,包括其临界点和固定点,Hessian和Hessian Spectrum。我们还表明,标准重量初始化产生非常平坦的功能,并且这种平坦度与过度散热量和初始重量尺度一起负责隐式正则化的强度和类型,与最近的ARXIV一致:1906.05827。我们的隐式正规化结果与最近的工作Arxiv:1906.07842独立完成,这表明初始化规模通过基于内核的参数严格控制隐式正则化。我们基于样条的方法重现了其关键的隐式正规化结果,但以更直观和透明的方式重现。展望未来,我们基于样条的方法可能会自然地扩展到多元和深度环境,并将在理解神经网络的努力中发挥基础作用。使用基于样条的可视化的学习动力学视频可在http://shorturl.at/tfwz2上获得。
Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. We propose reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat functions, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with recent work arXiv:1906.05827. Our implicit regularization results are complementary to recent work arXiv:1906.07842, done independently, which showed that initialization scale critically controls implicit regularization via a kernel-based argument. Our spline-based approach reproduces their key implicit regularization results but in a far more intuitive and transparent manner. Going forward, our spline-based approach is likely to extend naturally to the multivariate and deep settings, and will play a foundational role in efforts to understand neural networks. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2.