噪声正规化过度参数级的矩阵恢复，事实证明

论文标题

噪声正规化过度参数级的矩阵恢复，事实证明

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

论文作者

Liu, Tianyi, Li, Yan, Zhou, Enlu, Zhao, Tuo

论文摘要

我们研究了噪声在优化算法中学习过度参数化模型的作用。具体来说，我们考虑使用过度参数化模型从嘈杂的观察$ y $中恢复R^{d \ times d} $中的等级矩阵$ y^*\。我们将等级的一个矩阵$ y^*$通过$ xx^\ top $，其中$ x \ in r^{d \ times d} $。然后，我们表明，在轻度条件下，使用正方形损耗函数通过随机扰动的梯度下降算法获得的估计器达到均为$ O（σ^2/d）$的均方误差，其中$σ^2 $是观测噪声的方差。相比之下，通过梯度下降而没有随机扰动获得的估计器仅达到$ o（σ^2）$的均方误差。我们的结果部分证明了学习过度参数化模型时噪声的隐式正则化效果，并提供了对训练过度参数化的神经网络的新理解。

We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix $Y^*\in R^{d\times d}$ from a noisy observation $Y$ using an over-parameterization model. We parameterize the rank one matrix $Y^*$ by $XX^\top$, where $X\in R^{d\times d}$. We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O(σ^2/d)$, where $σ^2$ is the variance of the observational noise. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O(σ^2)$. Our result partially justifies the implicit regularization effect of noise when learning over-parameterized models, and provides new understanding of training over-parameterized neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题