渐近线：具有前景策略的正规自然梯度优化算法

论文标题

渐近线：具有前景策略的正规自然梯度优化算法

AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy

论文作者

Tang, Zedong, Jiang, Fenlong, Song, Junke, Gong, Maoguo, Li, Hao, Yu, Fan, Wang, Zidong, Wang, Min

论文摘要

与随机梯度下降（SGD）相比，尽管社区广泛关注和使用较差的概括性表现，但进一步调整梯度规模的优化者，例如亚当，天然梯度（NG）等。他们倾向于在培训开始时融合得很好，但最终很虚弱。一个直接的想法是用SGD补充这些算法的优势。但是，优化器的截断更换通常会导致更新模式的崩溃，而新算法通常需要许多迭代才能稳定其搜索方向。在这个想法并解决这个问题的驱动下，我们设计并提出了一种正规化的自然梯度优化算法，该算法具有渐进的策略，称为渐近天然梯度（ANG）。根据总迭代步骤，ANG动态组装Ng和Euclidean梯度，并使用NG的强度沿新方向更新参数。 CIFAR10和CIFAR100数据集的验证实验表明，ANG可以以二阶速度平稳稳定，并实现更好的概括性能。

Optimizers that further adjust the scale of gradient, such as Adam, Natural Gradient (NG), etc., despite widely concerned and used by the community, are often found poor generalization performance, compared with Stochastic Gradient Descent (SGD). They tend to converge excellently at the beginning of training but are weak at the end. An immediate idea is to complement the strengths of these algorithms with SGD. However, a truncated replacement of optimizer often leads to a crash of the update pattern, and new algorithms often require many iterations to stabilize their search direction. Driven by this idea and to address this problem, we design and present a regularized natural gradient optimization algorithm with look-ahead strategy, named asymptotic natural gradient (ANG). According to the total iteration step, ANG dynamic assembles NG and Euclidean gradient, and updates parameters along the new direction using the intensity of NG. Validation experiments on CIFAR10 and CIFAR100 data sets show that ANG can update smoothly and stably at the second-order speed, and achieve better generalization performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题