单调线性插值中的高原 - 深网的损失景观的“有偏见”

论文标题

单调线性插值中的高原 - 深网的损失景观的“有偏见”

Plateau in Monotonic Linear Interpolation -- A "Biased" View of Loss Landscape for Deep Networks

论文作者

Wang, Xiang, Wang, Annie N., Zhou, Mo, Ge, Rong

论文摘要

单调线性插值（MLI） - 在将随机初始化与最小化合物收集到的最小化的线上，损失和准确性是单调的 - 是一种在神经网络训练中通常观察到的现象。这种现象似乎表明神经网络的优化很容易。在本文中，我们表明MLI特性不一定与优化问题的硬度有关，并且对深神经网络的MLI进行了经验观察，这在很大程度上取决于偏见。特别是，我们表明，插值权重和偏见线性地导致对最终输出的影响非常不同，并且当不同类别在深网上具有不同的最后一层偏见时，损失和准确性插值都将有一个很长的高原（现有的MLI理论无法解释）。我们还展示了不同类别的最后层偏差如何使用简单模型在完美平衡的数据集上也有所不同。从经验上讲，我们证明了类似的直觉在实用的网络和现实数据集上。

Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks. Such a phenomenon may seem to suggest that optimization of neural networks is easy. In this paper, we show that the MLI property is not necessarily related to the hardness of optimization problems, and empirical observations on MLI for deep neural networks depend heavily on biases. In particular, we show that interpolating both weights and biases linearly leads to very different influences on the final output, and when different classes have different last-layer biases on a deep network, there will be a long plateau in both the loss and accuracy interpolation (which existing theory of MLI cannot explain). We also show how the last-layer biases for different classes can be different even on a perfectly balanced dataset using a simple model. Empirically we demonstrate that similar intuitions hold on practical networks and realistic datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题