删除批发归一化促进对抗训练

论文标题

删除批发归一化促进对抗训练

Removing Batch Normalization Boosts Adversarial Training

论文作者

Wang, Haotao, Zhang, Aston, Zheng, Shuai, Shi, Xingjian, Li, Mu, Wang, Zhangyang

论文摘要

对抗性训练（AT）捍卫深层神经网络免受对抗攻击。限制其实际应用的一个挑战是对干净样品的性能降解。以前的作品确定的主要瓶颈是广泛使用的批准归一化（BN），它努力在AT中为清洁和对抗性训练样本的不同统计数据建模。尽管主要的方法是扩展BN以捕获这种分布的混合物，但我们建议通过消除AT中的所有BN层来完全消除这种瓶颈。我们的无标准器鲁棒训练（NOFROST）方法将无标准器网络的最新进展扩展到了处理混合分配挑战时未开发的优势。我们表明，Nofrost在干净的样品准确性上只有轻微的牺牲才能实现对抗性的鲁棒性。在带有Resnet50的Imagenet上，Nofrost可实现$ 74.06 \％$清洁精度，从标准培训中降低了$ 2.00 \％$。相比之下，基于BN的基于BN的$ 59.28 \％$干净的准确性，从标准培训中获得了$ 16.78 \％$的大量下降。此外，Nofrost对PGD Attack的$ 23.56 \％$对抗性鲁棒性，这提高了基于BN AT的13.57美元\％$ $鲁棒性。我们观察到了更好的模型平滑度和来自Nofrost的更大决策范围，这使得模型对输入扰动的敏感性降低，从而更加健壮。此外，当将更多的数据增强纳入NOFROST时，它可以针对多个分配变化实现全面的鲁棒性。代码和预训练的模型在https://github.com/amazon-research/normalizer-free-robust-training上公开。

Adversarial training (AT) defends deep neural networks against adversarial attacks. One challenge that limits its practical application is the performance degradation on clean samples. A major bottleneck identified by previous works is the widely used batch normalization (BN), which struggles to model the different statistics of clean and adversarial training samples in AT. Although the dominant approach is to extend BN to capture this mixture of distribution, we propose to completely eliminate this bottleneck by removing all BN layers in AT. Our normalizer-free robust training (NoFrost) method extends recent advances in normalizer-free networks to AT for its unexplored advantage on handling the mixture distribution challenge. We show that NoFrost achieves adversarial robustness with only a minor sacrifice on clean sample accuracy. On ImageNet with ResNet50, NoFrost achieves $74.06\%$ clean accuracy, which drops merely $2.00\%$ from standard training. In contrast, BN-based AT obtains $59.28\%$ clean accuracy, suffering a significant $16.78\%$ drop from standard training. In addition, NoFrost achieves a $23.56\%$ adversarial robustness against PGD attack, which improves the $13.57\%$ robustness in BN-based AT. We observe better model smoothness and larger decision margins from NoFrost, which make the models less sensitive to input perturbations and thus more robust. Moreover, when incorporating more data augmentations into NoFrost, it achieves comprehensive robustness against multiple distribution shifts. Code and pre-trained models are public at https://github.com/amazon-research/normalizer-free-robust-training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题