论文标题
对抗性攻击中的防御友好图像:扰动难度的数据集和指标
Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty
论文作者
论文摘要
数据集偏差是对抗机器学习中的一个问题,尤其是在防御评估中。对抗性攻击或防御算法在报告的数据集上可能显示出比在其他数据集上复制的更好的结果。即使比较两种算法,它们的相对性能也会根据数据集而有所不同。深度学习为图像识别提供了最先进的解决方案,但是深层模型甚至对小型扰动也很容易受到影响。该领域的研究主要关注对抗性攻击和防御算法。在本文中,我们首次报告了一类强大的图像,它们既适合攻击,又比对抗性攻击在使用简单的防御技术下的随机图像更好。因此,具有很大比例的强大图像的测试数据集给人以对抗性攻击或防御性能的误导印象。我们建议三个指标来确定数据集中鲁棒图像的比例,并提供评分以确定数据集偏差。我们还提供了一个15000多个强大图像的Imagenet-R数据集,以促进对这种攻击下的图像强度现象的进一步研究。我们的数据集与拟议的指标相结合,对于对抗性攻击和防御算法的公正基准测试非常有价值。
Dataset bias is a problem in adversarial machine learning, especially in the evaluation of defenses. An adversarial attack or defense algorithm may show better results on the reported dataset than can be replicated on other datasets. Even when two algorithms are compared, their relative performance can vary depending on the dataset. Deep learning offers state-of-the-art solutions for image recognition, but deep models are vulnerable even to small perturbations. Research in this area focuses primarily on adversarial attacks and defense algorithms. In this paper, we report for the first time, a class of robust images that are both resilient to attacks and that recover better than random images under adversarial attacks using simple defense techniques. Thus, a test dataset with a high proportion of robust images gives a misleading impression about the performance of an adversarial attack or defense. We propose three metrics to determine the proportion of robust images in a dataset and provide scoring to determine the dataset bias. We also provide an ImageNet-R dataset of 15000+ robust images to facilitate further research on this intriguing phenomenon of image strength under attack. Our dataset, combined with the proposed metrics, is valuable for unbiased benchmarking of adversarial attack and defense algorithms.