对抗性模仿攻击

论文标题

对抗性模仿攻击

Adversarial Imitation Attack

论文作者

Zhou, Mingyi, Wu, Jing, Liu, Yipeng, Huang, Xiaolin, Liu, Shuaicheng, Zhang, Xiang, Zhu, Ce

论文摘要

深度学习模型已知容易受到对抗性例子的影响。实用的对抗性攻击应需要尽可能少的攻击模型知识。当前的替代攻击需要预先训练的模型来产生对抗性示例，其攻击成功率在很大程度上依赖于对抗性示例的可转移性。当前基于得分和基于决策的攻击需要大量的攻击模型查询。在这项研究中，我们提出了一种新颖的对抗性模仿攻击。首先，它通过像生成对抗网络（GAN）这样的两人游戏生成了攻击模型的复制品。生成模型的目的是生成引导模仿模型的示例，该模型通过攻击模型返回不同的输出。模仿模型的目的是在相同输入下使用攻击模型输出相同的标签。然后，使用模仿模型生成的对抗示例被用来欺骗受攻击的模型。与当前的替代攻击相比，模仿攻击可以使用较少的训练数据来产生攻击模型的复制品并提高对抗性示例的可传递性。实验表明，与黑框替代攻击相比，我们的模仿攻击需要更少的训练数据，但在没有查询的情况下，取得了接近对看不见数据的白盒攻击的攻击成功率。

Deep learning models are known to be vulnerable to adversarial examples. A practical adversarial attack should require as little as possible knowledge of attacked models. Current substitute attacks need pre-trained models to generate adversarial examples and their attack success rates heavily rely on the transferability of adversarial examples. Current score-based and decision-based attacks require lots of queries for the attacked models. In this study, we propose a novel adversarial imitation attack. First, it produces a replica of the attacked model by a two-player game like the generative adversarial networks (GANs). The objective of the generative model is to generate examples that lead the imitation model returning different outputs with the attacked model. The objective of the imitation model is to output the same labels with the attacked model under the same inputs. Then, the adversarial examples generated by the imitation model are utilized to fool the attacked model. Compared with the current substitute attacks, imitation attacks can use less training data to produce a replica of the attacked model and improve the transferability of adversarial examples. Experiments demonstrate that our imitation attack requires less training data than the black-box substitute attacks, but achieves an attack success rate close to the white-box attack on unseen data with no query.

下载PDF全文

下载文献需遵守相关版权规定

论文标题