论文标题
打破认证的防御:具有欺骗性鲁棒性证书的语义对抗性示例
Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates
论文作者
论文摘要
为了偏转对抗性攻击,已经提出了一系列“认证”分类器。除了标记图像之外,经过认证的分类器(如果可能的话)保证输入图像不是$ \ ell_p $ bunded的对抗性示例。我们提出了一项新的攻击,不仅利用了分类器的标签功能,还利用证书生成器的标签功能。所提出的方法采用大型扰动,将图像放置在远离类边界的同时,同时保持对抗性示例的不可识别性。拟议的“影子攻击”会导致确切的稳健网络错误地标记图像,并同时产生“欺骗”的鲁棒性证书。
To deflect adversarial attacks, a range of "certified" classifiers have been proposed. In addition to labeling an image, certified classifiers produce (when possible) a certificate guaranteeing that the input image is not an $\ell_p$-bounded adversarial example. We present a new attack that exploits not only the labelling function of a classifier, but also the certificate generator. The proposed method applies large perturbations that place images far from a class boundary while maintaining the imperceptibility property of adversarial examples. The proposed "Shadow Attack" causes certifiably robust networks to mislabel an image and simultaneously produce a "spoofed" certificate of robustness.