论文标题
非线性模型的线性稳定性假设和等级分层
Linear Stability Hypothesis and Rank Stratification for Nonlinear Models
论文作者
论文摘要
具有非线性体系结构/参数化的模型,例如深神经网络(DNN),以其在过度参数化时神秘良好的概括性能而闻名。在这项工作中,我们从一个新的角度来解决这个谜团,重点是目标恢复/拟合精度的过渡,这是训练数据大小的函数。我们为一般非线性模型提出了一个等级分层,以发现相应模型功能空间中每个函数的“有效参数大小”的模型等级。此外,我们建立了线性稳定性理论,证明目标函数几乎肯定会在训练数据大小等于其模型等级时线性稳定。在实验的支持下,我们提出了线性稳定性假设,即通过非线性训练优选线性稳定功能。通过这些结果,目标函数的模型等级可预测其成功恢复的最小训练数据大小。特别是针对完全连接或卷积架构的矩阵分解模型和DNN,我们的等级分层表明,特定目标函数的模型等级可以远低于模型参数的大小。该结果即使在这些非线性模型的重参数过度分析下,该结果也可以预测目标恢复能力,如我们的实验数量所证明的。总体而言,我们的工作提供了一个具有定量预测能力的统一框架,以了解一般非线性模型过度参数化的神秘目标恢复行为。
Models with nonlinear architectures/parameterizations such as deep neural networks (DNNs) are well known for their mysteriously good generalization performance at overparameterization. In this work, we tackle this mystery from a novel perspective focusing on the transition of the target recovery/fitting accuracy as a function of the training data size. We propose a rank stratification for general nonlinear models to uncover a model rank as an "effective size of parameters" for each function in the function space of the corresponding model. Moreover, we establish a linear stability theory proving that a target function almost surely becomes linearly stable when the training data size equals its model rank. Supported by our experiments, we propose a linear stability hypothesis that linearly stable functions are preferred by nonlinear training. By these results, model rank of a target function predicts a minimal training data size for its successful recovery. Specifically for the matrix factorization model and DNNs of fully-connected or convolutional architectures, our rank stratification shows that the model rank for specific target functions can be much lower than the size of model parameters. This result predicts the target recovery capability even at heavy overparameterization for these nonlinear models as demonstrated quantitatively by our experiments. Overall, our work provides a unified framework with quantitative prediction power to understand the mysterious target recovery behavior at overparameterization for general nonlinear models.