快速胜于免费：重新访问对抗性训练

论文标题

快速胜于免费：重新访问对抗性训练

Fast is better than free: Revisiting adversarial training

论文作者

Wong, Eric, Rice, Leslie, Kolter, J. Zico

论文摘要

对抗性训练是一种学习强大深层网络的方法，通常被认为比传统培训更昂贵，因为必须通过像投影梯度体面（PGD）这样的一阶方法来构建对抗性示例。在本文中，我们提出了一个令人惊讶的发现，可以使用更弱，更便宜的对手进行经验上强大的模型训练，这种方法以前被认为是无效的方法，这使该方法在实践中的成本不高。具体而言，我们表明，使用快速梯度符号方法（FGSM）与随机初始化结合使用的对抗性训练与基于PGD的培训一样有效，但成本大大降低。此外，我们表明，通过使用标准技术对深层网络进行有效训练，可以进一步加速FGSM对抗训练，从而使我们能够学习一个可靠的CIFAR10分类器，具有45％的PGD攻击，以$ε= 8/255 $在6分钟内以及与43％的43％Imagily Imagisifier coptifie $争率相比，具有45％的PGD攻击，以$ε= 8/255基于“免费”对抗训练的工作，该训练需要10和50个小时才能达到相同的阈值。最后，我们确定了一种被称为“灾难性过度拟合”的故障模式，这可能导致以前尝试使用FGSM对抗训练失败。所有用于复制本文实验的代码以及验证的模型权重均在https://github.com/locuslab/fast_adversarial上。

Adversarial training, a method for learning robust deep networks, is typically assumed to be more expensive than traditional training due to the necessity of constructing adversarial examples via a first-order method like projected gradient decent (PGD). In this paper, we make the surprising discovery that it is possible to train empirically robust models using a much weaker and cheaper adversary, an approach that was previously believed to be ineffective, rendering the method no more costly than standard training in practice. Specifically, we show that adversarial training with the fast gradient sign method (FGSM), when combined with random initialization, is as effective as PGD-based training but has significantly lower cost. Furthermore we show that FGSM adversarial training can be further accelerated by using standard techniques for efficient training of deep networks, allowing us to learn a robust CIFAR10 classifier with 45% robust accuracy to PGD attacks with $ε=8/255$ in 6 minutes, and a robust ImageNet classifier with 43% robust accuracy at $ε=2/255$ in 12 hours, in comparison to past work based on "free" adversarial training which took 10 and 50 hours to reach the same respective thresholds. Finally, we identify a failure mode referred to as "catastrophic overfitting" which may have caused previous attempts to use FGSM adversarial training to fail. All code for reproducing the experiments in this paper as well as pretrained model weights are at https://github.com/locuslab/fast_adversarial.

下载PDF全文

下载文献需遵守相关版权规定

论文标题