论文标题
最小参数化的深神经网络中的记忆和优化
Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization
论文作者
论文摘要
神经切线内核(NTK)已成为一种强大的工具,可在深层神经网络中提供记忆,优化和泛化。一系列工作研究了至少具有$ω(n)$神经元的层的两层和深网的NTK频谱,$ n $是训练样本的数量。此外,越来越多的证据表明,只要参数数量超过样品数量,具有子线性层宽度的深网是强大的记忆和优化器。因此,一个自然的开放问题是,在如此充满挑战的子线性设置中,NTK是否适应了良好的条件。在本文中,我们以肯定的方式回答了这个问题。我们的关键技术贡献是对最小的网络最小的NTK特征值的下限,其最小可能过度参数化:参数的数量大约为$ω(n)$,因此,神经元的数量高达$ω(\ sqrt {n})$。为了展示我们的NTK界限的适用性,我们为梯度下降训练提供了两个有关记忆能力和优化保证的结果。
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with $Ω(N)$ neurons, $N$ being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-linear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: the number of parameters is roughly $Ω(N)$ and, hence, the number of neurons is as little as $Ω(\sqrt{N})$. To showcase the applicability of our NTK bounds, we provide two results concerning memorization capacity and optimization guarantees for gradient descent training.