论文标题

二进制分类中双重下降的分析研究:损失的影响

Analytic Study of Double Descent in Binary Classification: The Impact of Loss

论文作者

Kini, Ganesh, Thrampoulidis, Christos

论文摘要

广泛的经验证据表明,对于各种不同的学习方法和数据集,风险曲线表现出双重变化(DD)趋势,这是模型大小的函数。在最近的一篇论文[Zeyu,Kammoun,Thrampoulidis,2019年]中,作者研究了二进制线性分类模型,并表明带有物流损失的梯度下降(GD)的测试误差会经历DD。在本文中,我们通过将它们扩展到GD并以正方形损失扩展到GD来补充。我们表明DD现象仍然存在,但是与后勤损失相比,我们也发现了几种差异。这强调了DD曲线的关键特征(例如它们的过渡阈值和全球最小值)都取决于训练数据和学习算法。我们进一步研究了DD曲线对训练集大小的依赖性。与我们较早的工作类似,我们的结果是分析性的:我们首先在高斯特征下首先得出测试误差来绘制DD曲线。尽管很简单,但这些模型允许对DD特征进行原则研究,从理论上讲,其结果证实了相关的相关经验发现,发生在更复杂的学习任务中。

Extensive empirical evidence reveals that, for a wide range of different learning methods and datasets, the risk curve exhibits a double-descent (DD) trend as a function of the model size. In a recent paper [Zeyu,Kammoun,Thrampoulidis,2019] the authors studied binary linear classification models and showed that the test error of gradient descent (GD) with logistic loss undergoes a DD. In this paper, we complement these results by extending them to GD with square loss. We show that the DD phenomenon persists, but we also identify several differences compared to logistic loss. This emphasizes that crucial features of DD curves (such as their transition threshold and global minima) depend both on the training data and on the learning algorithm. We further study the dependence of DD curves on the size of the training set. Similar to our earlier work, our results are analytic: we plot the DD curves by first deriving sharp asymptotics for the test error under Gaussian features. Albeit simple, the models permit a principled study of DD features, the outcomes of which theoretically corroborate related empirical findings occurring in more complex learning tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源