Adanorm：基于CNN的自适应梯度校正优化器

论文标题

Adanorm：基于CNN的自适应梯度校正优化器

AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs

论文作者

Dubey, Shiv Ram, Singh, Satish Kumar, Chaudhuri, Bidyut Baran

论文摘要

随机梯度下降（SGD）优化器通常用于训练卷积神经网络（CNN）。近年来，已经引入了几种基于自适应动量的SGD优化器，例如Adam，Diffgrad，Radam和Andabelief。但是，现有的SGD优化器不会利用过去迭代的梯度规范，并导致收敛性和性能不佳。在本文中，我们通过基于梯度规范的自适应训练历史纠正每个迭代中的梯度规范，提出了一种新型的基于adanorm的SGD优化器。通过这样做，所提出的优化者能够在整个培训中保持较高和代表性的梯度，并解决低和非典型的梯度问题。提出的概念是通用的，可以与任何现有的SGD优化器一起使用。我们通过四个最先进的优化器（包括亚当，diffgrad，radam和ababelief）显示了提出的adanorm的功效。我们在三个基准对象识别数据集上使用了三种CNN模型（包括VGG16，RESNET18和RESNET50），描述了提议的优化器，包括VGG16，RESNET18和RESNET50，包括CIFAR10，CIFAR100和Tinyimagenet上的性能提高。代码：https：//github.com/shivram1987/adanorm。

The stochastic gradient descent (SGD) optimizers are generally used to train the convolutional neural networks (CNNs). In recent years, several adaptive momentum based SGD optimizers have been introduced, such as Adam, diffGrad, Radam and AdaBelief. However, the existing SGD optimizers do not exploit the gradient norm of past iterations and lead to poor convergence and performance. In this paper, we propose a novel AdaNorm based SGD optimizers by correcting the norm of gradient in each iteration based on the adaptive training history of gradient norm. By doing so, the proposed optimizers are able to maintain high and representive gradient throughout the training and solves the low and atypical gradient problems. The proposed concept is generic and can be used with any existing SGD optimizer. We show the efficacy of the proposed AdaNorm with four state-of-the-art optimizers, including Adam, diffGrad, Radam and AdaBelief. We depict the performance improvement due to the proposed optimizers using three CNN models, including VGG16, ResNet18 and ResNet50, on three benchmark object recognition datasets, including CIFAR10, CIFAR100 and TinyImageNet. Code: https://github.com/shivram1987/AdaNorm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题