梯度面膜：横向抑制机制可改善人工神经网络的性能

论文标题

梯度面膜：横向抑制机制可改善人工神经网络的性能

Gradient Mask: Lateral Inhibition Mechanism Improves Performance in Artificial Neural Networks

论文作者

Jiang, Lei, Liu, Yongqing, Xiao, Shihai, Chua, Yansong

论文摘要

在生物大脑的皮层中观察到了侧向抑制作用，并且已经在其在认知功能中的作用进行了广泛的研究。但是，在深度学习中的香草版本反向传播中，所有梯度（可以理解为信号和噪声梯度）在重量更新过程中流过网络。这可能会导致过度拟合。在这项工作中，受到生物横向抑制的启发，我们提出了梯度面膜，该面膜在反向传播过程中有效地滤除了噪声梯度。这允许学习的功能信息更加强烈地存储在网络中，同时滤除嘈杂或不重要的功能。此外，我们在分析上证明了人工神经网络中的横向抑制如何提高传播梯度的质量。提出了一个新的梯度质量标准，该标准可以用作训练各种卷积神经网络（CNN）的措施。最后，我们进行了几种不同的实验，以研究梯度面膜如何定量和定性地改善网络的性能。定量地，原始CNN体系结构的准确性，修剪后的准确性以及对抗攻击后的准确性已显示出改善。从定性上讲，使用梯度掩模训练的CNN开发了显着图，主要集中在感兴趣的对象上，这对于数据增强和网络可解释性很有用。

Lateral inhibitory connections have been observed in the cortex of the biological brain, and has been extensively studied in terms of its role in cognitive functions. However, in the vanilla version of backpropagation in deep learning, all gradients (which can be understood to comprise of both signal and noise gradients) flow through the network during weight updates. This may lead to overfitting. In this work, inspired by biological lateral inhibition, we propose Gradient Mask, which effectively filters out noise gradients in the process of backpropagation. This allows the learned feature information to be more intensively stored in the network while filtering out noisy or unimportant features. Furthermore, we demonstrate analytically how lateral inhibition in artificial neural networks improves the quality of propagated gradients. A new criterion for gradient quality is proposed which can be used as a measure during training of various convolutional neural networks (CNNs). Finally, we conduct several different experiments to study how Gradient Mask improves the performance of the network both quantitatively and qualitatively. Quantitatively, accuracy in the original CNN architecture, accuracy after pruning, and accuracy after adversarial attacks have shown improvements. Qualitatively, the CNN trained using Gradient Mask has developed saliency maps that focus primarily on the object of interest, which is useful for data augmentation and network interpretability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题