Blackbox攻击通过替代合奏搜索

论文标题

Blackbox攻击通过替代合奏搜索

Blackbox Attacks via Surrogate Ensemble Search

论文作者

Cai, Zikui, Song, Chengyu, Krishnamurthy, Srikanth, Roy-Chowdhury, Amit, Asif, M. Salman

论文摘要

BlackBox对抗攻击可以分为基于转移和基于查询的攻击。转移方法不需要受害模型的任何反馈，而是与基于查询的方法相比提供较低的成功率。查询攻击通常需要大量的成功查询。为了达到两种方法，最近的努力都试图将它们结合起来，但仍需要数百个查询才能获得高成功率（尤其是针对目标攻击）。在本文中，我们提出了一种通过替代集合搜索（基地）进行黑框攻击的新方法，该方法可以使用极少量的查询来产生非常成功的黑盒攻击。我们首先定义了扰动机，该机器通过在固定的替代模型上最小化加权损失函数来生成扰动的图像。为了为给定受害者模型生成攻击，我们使用扰动机产生的查询搜索损失函数中的权重。由于搜索空间的尺寸很小（与替代模型的数量相同），因此搜索需要少量查询。我们证明，与经过Imagenet训练的不同图像分类器上的最新方法相比，我们提出的方法的查询至少少30倍，可以达到更好的成功率。特别是，我们的方法平均需要每张图像3个查询，以实现目标攻击的成功率超过90％的成功率，而无预告攻击的成功率超过99％的成功率。我们的方法也对Google Cloud Vision API有效，并获得了91％的无靶向攻击成功率，每个图像的查询为2.9。我们还表明，我们提出的方法产生的扰动是高度转移的，可以用于硬标签黑框攻击。我们还显示了基础对对象探测器的攻击的有效性。

Blackbox adversarial attacks can be categorized into transfer- and query-based attacks. Transfer methods do not require any feedback from the victim model, but provide lower success rates compared to query-based methods. Query attacks often require a large number of queries for success. To achieve the best of both approaches, recent efforts have tried to combine them, but still require hundreds of queries to achieve high success rates (especially for targeted attacks). In this paper, we propose a novel method for Blackbox Attacks via Surrogate Ensemble Search (BASES) that can generate highly successful blackbox attacks using an extremely small number of queries. We first define a perturbation machine that generates a perturbed image by minimizing a weighted loss function over a fixed set of surrogate models. To generate an attack for a given victim model, we search over the weights in the loss function using queries generated by the perturbation machine. Since the dimension of the search space is small (same as the number of surrogate models), the search requires a small number of queries. We demonstrate that our proposed method achieves better success rate with at least 30x fewer queries compared to state-of-the-art methods on different image classifiers trained with ImageNet. In particular, our method requires as few as 3 queries per image (on average) to achieve more than a 90% success rate for targeted attacks and 1-2 queries per image for over a 99% success rate for untargeted attacks. Our method is also effective on Google Cloud Vision API and achieved a 91% untargeted attack success rate with 2.9 queries per image. We also show that the perturbations generated by our proposed method are highly transferable and can be adopted for hard-label blackbox attacks. We also show effectiveness of BASES for hiding attacks on object detectors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题