论文标题
来自特征的建模:过度参数深神经网络的平均场框架
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks
论文作者
论文摘要
本文提出了一个用于过度参数深神经网络(DNNS)的新的平均场框架,该框架可用于分析神经网络训练。在此框架中,DNN在其特征(即训练数据上隐藏单位的功能值)上以概率度量和功能表示,而不是大多数现有研究所做的,而不是神经网络参数。这种新的表示克服了堕落的情况,在这些情况下,所有隐藏的单元本质上只有一个中间层中只有一个有意义的隐藏单元,并进一步导致了DNN的更简单表示,为此,可以通过合适的重新参数化将培训目标重新构建为凸优化问题。此外,我们构建了一种称为神经特征流的非线性动力学,该动力学捕获了通过梯度下降训练的过度参数的DNN的演变。我们通过标准DNN和残差网络(RES-NET)体系结构说明了框架。此外,对于RES-NET,当神经特征流过程收敛时,它在适当的条件下达到了全局最小解决方案。我们的分析导致了过度参数化的神经网络培训的首个全球融合证明,其平均场政权超过3美元。
This paper proposes a new mean-field framework for over-parameterized deep neural networks (DNNs), which can be used to analyze neural network training. In this framework, a DNN is represented by probability measures and functions over its features (that is, the function values of the hidden units over the training data) in the continuous limit, instead of the neural network parameters as most existing studies have done. This new representation overcomes the degenerate situation where all the hidden units essentially have only one meaningful hidden unit in each middle layer, and further leads to a simpler representation of DNNs, for which the training objective can be reformulated as a convex optimization problem via suitable re-parameterization. Moreover, we construct a non-linear dynamics called neural feature flow, which captures the evolution of an over-parameterized DNN trained by Gradient Descent. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures. Furthermore, we show, for Res-Net, when the neural feature flow process converges, it reaches a global minimal solution under suitable conditions. Our analysis leads to the first global convergence proof for over-parameterized neural network training with more than $3$ layers in the mean-field regime.