变形损失表面

论文标题

变形损失表面

Deforming the Loss Surface

论文作者

Chen, Liangming, Jin, Long, Du, Xiujuan, Li, Shuai, Liu, Mei

论文摘要

在深度学习中，通常假定损失表面的形状是固定的。不同的是，本文首先提出了一种新颖的变形算子概念，以使损失表面变形，从而改善优化。变形函数是一种变形操作员，可以改善概括性能。此外，设计了各种变形功能，并进一步提供了它们对损耗表面的贡献。然后，从理论上讲，原始的随机梯度下降优化器被证明是一种平坦的最小过滤器，具有滤除锋利最小值的人才。此外，可以通过利用所提出的变形函数来获得平坦的最小值，该函数在CIFAR-100上进行了验证，并且在原始优化器和优化器获得的临界点附近可视化损失景观，并通过变形函数增强了损失景观。实验结果表明，变形函数确实会发现平坦的区域。此外，在Imagenet，Cifar-10和CIFAR-100上，将流行的卷积神经网络与相应的原始模型进行了比较，在所有涉及的模型上都观察到具有变形函数的所有涉及模型。例如，CIFAR-100 RESNET-20的TOP-1测试准确性增加了1.46％，而额外的计算开销微不足道。

In deep learning, it is usually assumed that the shape of the loss surface is fixed. Differently, a novel concept of deformation operator is first proposed in this paper to deform the loss surface, thereby improving the optimization. Deformation function, as a type of deformation operator, can improve the generalization performance. Moreover, various deformation functions are designed, and their contributions to the loss surface are further provided. Then, the original stochastic gradient descent optimizer is theoretically proved to be a flat minima filter that owns the talent to filter out the sharp minima. Furthermore, the flatter minima could be obtained by exploiting the proposed deformation functions, which is verified on CIFAR-100, with visualizations of loss landscapes near the critical points obtained by both the original optimizer and optimizer enhanced by deformation functions. The experimental results show that deformation functions do find flatter regions. Moreover, on ImageNet, CIFAR-10, and CIFAR-100, popular convolutional neural networks enhanced by deformation functions are compared with the corresponding original models, where significant improvements are observed on all of the involved models equipped with deformation functions. For example, the top-1 test accuracy of ResNet-20 on CIFAR-100 increases by 1.46%, with insignificant additional computational overhead.

下载PDF全文

下载文献需遵守相关版权规定

论文标题