使用黑盒微分技术生成对抗输入

论文标题

使用黑盒微分技术生成对抗输入

Generating Adversarial Inputs Using A Black-box Differential Technique

论文作者

Juúnior, João Batista Pereira Matos, Cordeiro, Lucas Carvalho, d'Amorim, Marcelo, Huang, Xiaowei

论文摘要

已知神经网络（NNS）容易受到对抗攻击的影响。恶意药物通过将输入扰动另一个攻击来引发这些攻击，从而使两个输入对NN的分类有所不同。在本文中，我们考虑了一类特殊的对抗示例，这些例子不仅可以表现出NN模型的弱点 - 就像典型的对抗性示例一样，而且还可以表现出两个NN模型之间的不同行为。我们称它们为差异引起对抗性的例子或二子。具体而言，我们提出了Daegen，这是第一种用于对抗输入生成的黑盒微分技术。 Daegen以相同分类问题的两个NN模型为输入，并报告输出一个对抗性示例。所获得的对抗示例是透水，因此它代表了两个NN模型之间的输入空间的点差。从算法上，Daegen使用基于局部搜索的优化算法通过迭代扰动输入来最大程度地提高两个模型的差异来预测输入，以最大程度地增加两个模型的差异。我们在基准数据集（例如MNIST，ImageNet和Drive）和NN模型（例如Lenet，Resnet，Dave和VGG）上进行实验。实验结果是有希望的。首先，我们将Daegen与两种现有的白盒差异技术（DeepXplore和dlfuzz）进行了比较，并发现在相同的环境下，Daegen是有效的，即，它是唯一能够在所有情况下成功产生攻击的技术，2）精确，即，clastialige offersiage offersial offersiage ofersive to norme woodsival and woodsive and and and and and and and to and I. to and to to to and I. utigans I. I. I. 3）。查询。其次，我们将Daegen与最先进的Black-Box对抗攻击方法（SIMBA和TREMBA）进行了比较，通过调整它们在差异设置上工作。实验结果表明，Daegen的性能比两者都更好。

Neural Networks (NNs) are known to be vulnerable to adversarial attacks. A malicious agent initiates these attacks by perturbing an input into another one such that the two inputs are classified differently by the NN. In this paper, we consider a special class of adversarial examples, which can exhibit not only the weakness of NN models - as do for the typical adversarial examples - but also the different behavior between two NN models. We call them difference-inducing adversarial examples or DIAEs. Specifically, we propose DAEGEN, the first black-box differential technique for adversarial input generation. DAEGEN takes as input two NN models of the same classification problem and reports on output an adversarial example. The obtained adversarial example is a DIAE, so that it represents a point-wise difference in the input space between the two NN models. Algorithmically, DAEGEN uses a local search-based optimization algorithm to find DIAEs by iteratively perturbing an input to maximize the difference of two models on predicting the input. We conduct experiments on a spectrum of benchmark datasets (e.g., MNIST, ImageNet, and Driving) and NN models (e.g., LeNet, ResNet, Dave, and VGG). Experimental results are promising. First, we compare DAEGEN with two existing white-box differential techniques (DeepXplore and DLFuzz) and find that under the same setting, DAEGEN is 1) effective, i.e., it is the only technique that succeeds in generating attacks in all cases, 2) precise, i.e., the adversarial attacks are very likely to fool machines and humans, and 3) efficient, i.e, it requires a reasonable number of classification queries. Second, we compare DAEGEN with state-of-the-art black-box adversarial attack methods (simba and tremba), by adapting them to work on a differential setting. The experimental results show that DAEGEN performs better than both of them.

下载PDF全文

下载文献需遵守相关版权规定

论文标题