论文标题
高斯协变量模型中的梯度流:学习曲线和多个下降结构的精确解决方案
Gradient flow in the gaussian covariate model: exact solution of learning curves and multiple descent structures
论文作者
论文摘要
最近的一项工作表明,在简单学习模型中,概括性误差曲线的出色行为。即使是最小二乘的回归也显示出非典型特征,例如模型的双重下降,进一步的工作观察到了三重或多个下降。另一个重要的特征是训练过程中出现的时期下降结构。模型和时代下降的观察结果已在有限的理论设置(例如随机特征模型)中进行了分析,因此是实验性的。在这项工作中,我们对概括曲线的整个时间进化,在渐近的大维度和梯度流下的整个时间进化提供了完整而统一的分析,这是在高斯协变量模型所致的更广泛的理论设置中。特别是,我们涵盖了在文献中已经不同观察到的大多数情况,还提供了存在多个下降结构作为模型参数或时间的函数的示例。此外,我们表明我们的理论预测与梯度下降相对于现实数据集获得了足够的学习曲线。从技术上讲,我们使用基于“线性铅笔”的随机矩阵理论中的最新发展计算涉及随机矩阵的有理表达式的平均值。在随机矩阵理论中也具有独立兴趣的另一个贡献是使用Dyson Brownian动作对相关的固定点方程(以及当地的扩展)的新推导。
A recent line of work has shown remarkable behaviors of the generalization error curves in simple learning models. Even the least-squares regression has shown atypical features such as the model-wise double descent, and further works have observed triple or multiple descents. Another important characteristic are the epoch-wise descent structures which emerge during training. The observations of model-wise and epoch-wise descents have been analytically derived in limited theoretical settings (such as the random feature model) and are otherwise experimental. In this work, we provide a full and unified analysis of the whole time-evolution of the generalization curve, in the asymptotic large-dimensional regime and under gradient-flow, within a wider theoretical setting stemming from a gaussian covariate model. In particular, we cover most cases already disparately observed in the literature, and also provide examples of the existence of multiple descent structures as a function of a model parameter or time. Furthermore, we show that our theoretical predictions adequately match the learning curves obtained by gradient descent over realistic datasets. Technically we compute averages of rational expressions involving random matrices using recent developments in random matrix theory based on "linear pencils". Another contribution, which is also of independent interest in random matrix theory, is a new derivation of related fixed point equations (and an extension there-off) using Dyson brownian motions.