论文标题

深层神经网络的内部协变性偏移算法通过将图层的输出定为单位

An Internal Covariate Shift Bounding Algorithm for Deep Neural Networks by Unitizing Layers' Outputs

论文作者

Huang, You, Yu, Yuanlong

论文摘要

已经提出了批次归一化技术(BN)技术,以减少所谓的内部协变量(ICS),试图使层输出的分布保持不变。实验表明了它们对训练深层神经网络的有效性。但是,由于仅在这些BN技术中控制了前两个矩,因此似乎对层分布施加了薄弱的约束,此外,这种约束是否可以减少IC是未知的。因此,本文提出了通过使用Earth Mover(EM)距离的IC的度量,然后得出该度量的上和下限,以提供BN的理论分析。上限表明,BN技术只能控制尺寸较低且噪声较小的输出,而在其他情况下它们的控制则无效。本文还证明,这种控制只是IC的界限,而不是IC的减少。同时,分析表明,BN无法控制的高阶力矩和噪声对下限有很大的影响。基于此类分析,本文还提出了一种算法,该算法将输出用可调节的参数统一以进一步绑定IC,以应对BN的问题。所提出的校准的上限无噪声,仅由参数主导。因此,可以训练参数以调整结合并进一步控制IC。此外,将统一嵌入到BN的框架中以减少信息损失。实验表明,该提出的算法在CIFAR-10,CIFAR-100和Imagenet数据集上的现有BN技术优于现有的BN技术。

Batch Normalization (BN) techniques have been proposed to reduce the so-called Internal Covariate Shift (ICS) by attempting to keep the distributions of layer outputs unchanged. Experiments have shown their effectiveness on training deep neural networks. However, since only the first two moments are controlled in these BN techniques, it seems that a weak constraint is imposed on layer distributions and furthermore whether such constraint can reduce ICS is unknown. Thus this paper proposes a measure for ICS by using the Earth Mover (EM) distance and then derives the upper and lower bounds for the measure to provide a theoretical analysis of BN. The upper bound has shown that BN techniques can control ICS only for the outputs with low dimensions and small noise whereas their control is NOT effective in other cases. This paper also proves that such control is just a bounding of ICS rather than a reduction of ICS. Meanwhile, the analysis shows that the high-order moments and noise, which BN cannot control, have great impact on the lower bound. Based on such analysis, this paper furthermore proposes an algorithm that unitizes the outputs with an adjustable parameter to further bound ICS in order to cope with the problems of BN. The upper bound for the proposed unitization is noise-free and only dominated by the parameter. Thus, the parameter can be trained to tune the bound and further to control ICS. Besides, the unitization is embedded into the framework of BN to reduce the information loss. The experiments show that this proposed algorithm outperforms existing BN techniques on CIFAR-10, CIFAR-100 and ImageNet datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源