论文标题
内核方法和多层感知器在高维度中学习线性模型
Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High Dimensions
论文作者
论文摘要
对高维现象(例如双重下降行为)的经验观察吸引了人们对理解经典技术(例如内核方法)的浓厚兴趣,及其对解释神经网络的通用性能的含义。许多最近的著作分析了协变量是独立的一定高维度的这种模型,样品数量和协变量的数量以固定比率(即比例渐近线)增长。在这项工作中,我们表明,对于大量的内核,包括完全连接网络的神经切线内核,内核方法只能在此制度中执行线性模型。更令人惊讶的是,当数据是由内核模型生成的,其中输入与响应之间的关系可能非常非线性,我们表明线性模型实际上是最佳的,即线性模型在所有模型,线性或非线性中都达到了最小风险。这些结果表明,对于高维分析,需要更复杂的数据模型。
Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of samples and the number of covariates grow at a fixed ratio (i.e. proportional asymptotics). In this work we show that for a large class of kernels, including the neural tangent kernel of fully connected networks, kernel methods can only perform as well as linear models in this regime. More surprisingly, when the data is generated by a kernel model where the relationship between input and the response could be very nonlinear, we show that linear models are in fact optimal, i.e. linear models achieve the minimum risk among all models, linear or nonlinear. These results suggest that more complex models for the data other than independent features are needed for high-dimensional analysis.