使用多目标遗传优化对文本分类器的对抗性黑框攻击，深网络指导

论文标题

使用多目标遗传优化对文本分类器的对抗性黑框攻击，深网络指导

Adversarial Black-Box Attacks On Text Classifiers Using Multi-Objective Genetic Optimization Guided By Deep Networks

论文作者

Mathai, Alex, Khare, Shreya, Tamilselvam, Srikanth, Mani, Senthil

论文摘要

我们提出了一种新型的遗传偏高技术，该技术生成黑盒对抗性示例，该示例成功地欺骗了基于神经网络的文本分类器。我们以基于深度学习的推论和SEQ2SEQ突变为指导的多目标优化执行遗传搜索，以产生语义上相似但不可感知的对手。我们通过攻击三种训练有素的模型，将我们的方法与SST和IMDB情感数据集上的DeepWordBug（DWB）进行比较。 char-lstm，word-lstm和elmo-lstm。平均而言，在这三种模型中，SST的攻击成功率分别为65.67％，IMDB的攻击成功率分别为49.48％和101％。此外，我们的定性研究表明，94％的时间，用户无法区分原始样本和对抗样本。

We propose a novel genetic-algorithm technique that generates black-box adversarial examples which successfully fool neural network based text classifiers. We perform a genetic search with multi-objective optimization guided by deep learning based inferences and Seq2Seq mutation to generate semantically similar but imperceptible adversaries. We compare our approach with DeepWordBug (DWB) on SST and IMDB sentiment datasets by attacking three trained models viz. char-LSTM, word-LSTM and elmo-LSTM. On an average, we achieve an attack success rate of 65.67% for SST and 36.45% for IMDB across the three models showing an improvement of 49.48% and 101% respectively. Furthermore, our qualitative study indicates that 94% of the time, the users were not able to distinguish between an original and adversarial sample.

下载PDF全文

下载文献需遵守相关版权规定

论文标题