旨在理解和利用对抗检测中图像转化的效果

论文标题

旨在理解和利用对抗检测中图像转化的效果

Towards Understanding and Harnessing the Effect of Image Transformation in Adversarial Detection

论文作者

Liu, Hui, Zhao, Bo, Peng, Yuefeng, Li, Weidong, Liu, Peng

论文摘要

深层神经网络（DNN）受到对抗性例子的威胁。对抗性检测将对抗图像与良性图像区分开，这对于强大的基于DNN的服务至关重要。图像转换是检测对抗性例子的最有效方法之一。在过去的几年中，已经研究并讨论了各种图像转换，以设计可靠的对抗探测器。在本文中，我们通过新颖的分类方法系统地从图像转换中系统地综合了对抗检测的最新进展。然后，我们进行了广泛的实验，以测试针对最先进的对抗攻击的图像转换的检测性能。此外，我们揭示了每个单独的转换都无法以强大的方式检测对抗示例，并提出了一种基于DNN的方法，称为\ emph {advjudge}，它结合了9个图像转换的得分。如果不知道哪些个人分数具有误导性或不误导性，则Adv Judge可以做出正确的判断，并在检测率上取得重大提高。最后，我们利用可解释的AI工具来显示每个图像转换对对抗检测的贡献。实验结果表明，图像转化对对抗性检测的贡献显着不同，它们的组合可以显着提高针对最新的对抗攻击的通用检测能力。

Deep neural networks (DNNs) are threatened by adversarial examples. Adversarial detection, which distinguishes adversarial images from benign images, is fundamental for robust DNN-based services. Image transformation is one of the most effective approaches to detect adversarial examples. During the last few years, a variety of image transformations have been studied and discussed to design reliable adversarial detectors. In this paper, we systematically synthesize the recent progress on adversarial detection via image transformations with a novel classification method. Then, we conduct extensive experiments to test the detection performance of image transformations against state-of-the-art adversarial attacks. Furthermore, we reveal that each individual transformation is not capable of detecting adversarial examples in a robust way, and propose a DNN-based approach referred to as \emph{AdvJudge}, which combines scores of 9 image transformations. Without knowing which individual scores are misleading or not misleading, AdvJudge can make the right judgment, and achieve a significant improvement in detection rate. Finally, we utilize an explainable AI tool to show the contribution of each image transformation to adversarial detection. Experimental results show that the contribution of image transformations to adversarial detection is significantly different, the combination of them can significantly improve the generic detection ability against state-of-the-art adversarial attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题