最小参数化的深神经网络中的记忆和优化

论文标题

最小参数化的深神经网络中的记忆和优化

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

论文作者

Bombari, Simone, Amani, Mohammad Hossein, Mondelli, Marco

论文摘要

神经切线内核（NTK）已成为一种强大的工具，可在深层神经网络中提供记忆，优化和泛化。一系列工作研究了至少具有$ω（n）$神经元的层的两层和深网的NTK频谱，$ n $是训练样本的数量。此外，越来越多的证据表明，只要参数数量超过样品数量，具有子线性层宽度的深网是强大的记忆和优化器。因此，一个自然的开放问题是，在如此充满挑战的子线性设置中，NTK是否适应了良好的条件。在本文中，我们以肯定的方式回答了这个问题。我们的关键技术贡献是对最小的网络最小的NTK特征值的下限，其最小可能过度参数化：参数的数量大约为$ω（n）$，因此，神经元的数量高达$ω（\ sqrt {n}）$。为了展示我们的NTK界限的适用性，我们为梯度下降训练提供了两个有关记忆能力和优化保证的结果。

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with $Ω(N)$ neurons, $N$ being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-linear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: the number of parameters is roughly $Ω(N)$ and, hence, the number of neurons is as little as $Ω(\sqrt{N})$. To showcase the applicability of our NTK bounds, we provide two results concerning memorization capacity and optimization guarantees for gradient descent training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题