打破认证的防御：具有欺骗性鲁棒性证书的语义对抗性示例

论文标题

打破认证的防御：具有欺骗性鲁棒性证书的语义对抗性示例

Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates

论文作者

Ghiasi, Amin, Shafahi, Ali, Goldstein, Tom

论文摘要

为了偏转对抗性攻击，已经提出了一系列“认证”分类器。除了标记图像之外，经过认证的分类器（如果可能的话）保证输入图像不是$ \ ell_p $ bunded的对抗性示例。我们提出了一项新的攻击，不仅利用了分类器的标签功能，还利用证书生成器的标签功能。所提出的方法采用大型扰动，将图像放置在远离类边界的同时，同时保持对抗性示例的不可识别性。拟议的“影子攻击”会导致确切的稳健网络错误地标记图像，并同时产生“欺骗”的鲁棒性证书。

To deflect adversarial attacks, a range of "certified" classifiers have been proposed. In addition to labeling an image, certified classifiers produce (when possible) a certificate guaranteeing that the input image is not an $\ell_p$-bounded adversarial example. We present a new attack that exploits not only the labelling function of a classifier, but also the certificate generator. The proposed method applies large perturbations that place images far from a class boundary while maintaining the imperceptibility property of adversarial examples. The proposed "Shadow Attack" causes certifiably robust networks to mislabel an image and simultaneously produce a "spoofed" certificate of robustness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题