论文标题
两层神经网络的平均场分析:线性收敛速率的全球最优性
Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates
论文作者
论文摘要
我们考虑在平均场式中优化两层神经网络,在这些概率测量空间中,可以通过与神经元相关的权重参数的概率测量空间中的进化来近似网络权重的学习动力学。平均场地制度是NTK(懒惰训练)制度的理论上有吸引力的替代品,仅在特殊初始化围绕所谓的神经切线内核空间中受到本地限制。几项先前的作品(\ Cite {Chizat2018Global,MEI2018Mean})建立了均值场政权的渐近全球最优性,但是由于训练动力学的复杂未结合的非界限,因此获得定量收敛速度仍然具有挑战性。这项工作确立了通过在平均场状态中连续时噪声梯度下降训练的香草两层神经网络的第一个线性收敛结果。我们的结果依赖于对数Sobolev常数的新颖的时间剥离估计,用于由隐藏神经元的不断发展的分布确定的一系列度量。
We consider optimizing two-layer neural networks in the mean-field regime where the learning dynamics of network weights can be approximated by the evolution in the space of probability measures over the weight parameters associated with the neurons. The mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime which is only restricted locally in the so-called neural tangent kernel space around specialized initializations. Several prior works (\cite{chizat2018global, mei2018mean}) establish the asymptotic global optimality of the mean-field regime, but it is still challenging to obtain a quantitative convergence rate due to the complicated unbounded nonlinearity of the training dynamics. This work establishes the first linear convergence result for vanilla two-layer neural networks trained by continuous-time noisy gradient descent in the mean-field regime. Our result relies on a novel time-depdendent estimate of the logarithmic Sobolev constants for a family of measures determined by the evolving distribution of hidden neurons.