从提取非体外和高度预测性的对抗扰动中检测和恢复对抗性示例

论文标题

从提取非体外和高度预测性的对抗扰动中检测和恢复对抗性示例

Detecting and Recovering Adversarial Examples from Extracting Non-robust and Highly Predictive Adversarial Perturbations

论文作者

Dong, Mingyu, Chen, Jiahao, Yan, Diqun, Gao, Jingxing, Dong, Li, Wang, Rangding

论文摘要

深度神经网络（DNN）已被证明是针对对抗性例子（AE）的易受伤害的，这些例子是恶性设计的，旨在欺骗目标模型。添加了不可察觉的对抗扰动的正常示例（NES）可能是对DNN的安全威胁。尽管现有的AES检测方法已经达到了很高的精度，但他们未能利用检测到的AE的信息。因此，基于高维扰动提取，我们提出了一种无模型的AES检测方法，其整个过程没有查询受害者模型。研究表明，DNN对高维度敏感。对抗示例中隐藏的对抗扰动属于高维特征，高维特征是高度预测性且不持bust的。 DNN比其他人从高维数据中学习更多细节。在我们的方法中，扰动提取器可以从AES作为高维特征提取对抗性扰动，然后训练有素的AES鉴别器确定输入是否为AE。实验结果表明，所提出的方法不仅可以高精度检测对抗示例，而且可以检测AE的特定类别。同时，提取的扰动可用于将AE恢复到NES。

Deep neural networks (DNNs) have been shown to be vulnerable against adversarial examples (AEs) which are maliciously designed to fool target models. The normal examples (NEs) added with imperceptible adversarial perturbation, can be a security threat to DNNs. Although the existing AEs detection methods have achieved a high accuracy, they failed to exploit the information of the AEs detected. Thus, based on high-dimension perturbation extraction, we propose a model-free AEs detection method, the whole process of which is free from querying the victim model. Research shows that DNNs are sensitive to the high-dimension features. The adversarial perturbation hiding in the adversarial example belongs to the high-dimension feature which is highly predictive and non-robust. DNNs learn more details from high-dimension data than others. In our method, the perturbation extractor can extract the adversarial perturbation from AEs as high-dimension feature, then the trained AEs discriminator determines whether the input is an AE. Experimental results show that the proposed method can not only detect the adversarial examples with high accuracy, but also detect the specific category of the AEs. Meanwhile, the extracted perturbation can be used to recover the AEs to NEs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题