关于最低$ l_2 $ norm interpolant估算器及其普遍性的良性过度属性的几何观点

论文标题

关于最低$ l_2 $ norm interpolant估算器及其普遍性的良性过度属性的几何观点

A geometrical viewpoint on the benign overfitting property of the minimum $l_2$-norm interpolant estimator and its universality

论文作者

Lecué, Guillaume, Shang, Zong

论文摘要

在线性回归模型中，最小L2-Norm插值估计器受到了很大的关注，因为事实证明它是一致的，即使它在输入载体的协方差矩阵$σ$上完全适合噪声数据，称为良性过拟合。在这种现象中，我们从几何观点研究了该估计量的概括性能。我们的主要结果延长并提高了[Tsigler和Bartlett]的收敛速率以及偏差概率。我们的证明与经典偏差/方差分析有所不同，并且基于[Bartlett，Montanari和Rakhlin]引入的自我诱导的正则化属性：最小L2-norm插值估计量可以写成ridge估计器的总和和过度拟合成分。我们分析核心的随机高斯矩阵的两个几何特性是Dvoretsky-Milman定理和同构和受限的同构特性。特别是，Dvoretsky的维度自然出现在我们的几何观点中，与有效的等级相吻合，并且是处理设计矩阵的行为的关键工具，该行为仅限于发生过度拟合的子空间。我们将这些结果扩展到了重型方案，证明了这一现象的普遍性，超出了指数力矩的假设。这种现象以前是未知的，被广泛认为是一个重大挑战。这是从概率的Dvoretsky-Milman定理的一般版本中，该定理具有独立关注的重尾矢量。

In the linear regression model, the minimum l2-norm interpolant estimator has received much attention since it was proved to be consistent even though it fits noisy data perfectly under some condition on the covariance matrix $Σ$ of the input vector, known as benign overfitting. Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from [Tsigler and Bartlett]. Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [Bartlett, Montanari and Rakhlin]: the minimum l2-norm interpolant estimator can be written as a sum of a ridge estimator and an overfitting component. The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint, coincides with the effective rank and is the key tool for handling the behavior of the design matrix restricted to the sub-space where overfitting happens. We extend these results to heavy-tailed scenarii proving the universality of this phenomenon beyond exponential moment assumptions. This phenomenon is unknown before and is widely believed to be a significant challenge. This follows from an anistropic version of the probabilistic Dvoretsky-Milman theorem that holds for heavy-tailed vectors which is of independent interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题