（de）随机平滑，以防御补丁攻击

论文标题

（de）随机平滑，以防御补丁攻击

(De)Randomized Smoothing for Certifiable Defense against Patch Attacks

论文作者

Levine, Alexander, Feizi, Soheil

论文摘要

贴片对图像的对抗攻击，其中攻击者可以在界面大小的区域内扭曲像素，这是一个重要的威胁模型，因为它们为物理对抗性攻击提供了定量模型。在本文中，我们引入了针对补丁攻击的可认证防御，该防御攻击保证了给定的图像和补丁攻击大小，没有补丁的对抗示例。我们的方法与广泛的随机平滑鲁棒性方案有关，这些方案提供了高信心概率的鲁棒性证书。通过利用贴剂攻击比一般稀疏攻击更受限制的事实，我们获得了对它们的有意义的大鲁棒性证书。此外，与针对L_P和稀疏攻击的基于平滑的防御能力相比，我们针对补丁攻击的防御方法已被驱散，从而改善了确定性的证书。与Chiang等人提出的现有补丁认证方法相比。（2020）依赖于间隔结合的传播，我们的方法可以更快地训练，在CIFAR-10上实现高清洁和认证的稳健精度，并在Imagenet量表上提供证书。例如，对于对CIFAR-10的5 x-5补丁攻击，我们的方法可实现大约57.6％的认证精度（具有约83.8％清洁精度的分类器），而现有方法的最多30.3％（具有约为47.8％清洁精度的分类器的30.3％认证精度）。我们的结果有效地建立了针对CIFAR-10和Imagenet补丁攻击的新最新的可证明防御。代码可从https://github.com/alevine0/patchsmoothing获得。

Patch adversarial attacks on images, in which the attacker can distort pixels within a region of bounded size, are an important threat model since they provide a quantitative model for physical adversarial attacks. In this paper, we introduce a certifiable defense against patch attacks that guarantees for a given image and patch attack size, no patch adversarial examples exist. Our method is related to the broad class of randomized smoothing robustness schemes which provide high-confidence probabilistic robustness certificates. By exploiting the fact that patch attacks are more constrained than general sparse attacks, we derive meaningfully large robustness certificates against them. Additionally, in contrast to smoothing-based defenses against L_p and sparse attacks, our defense method against patch attacks is de-randomized, yielding improved, deterministic certificates. Compared to the existing patch certification method proposed by Chiang et al. (2020), which relies on interval bound propagation, our method can be trained significantly faster, achieves high clean and certified robust accuracy on CIFAR-10, and provides certificates at ImageNet scale. For example, for a 5-by-5 patch attack on CIFAR-10, our method achieves up to around 57.6% certified accuracy (with a classifier with around 83.8% clean accuracy), compared to at most 30.3% certified accuracy for the existing method (with a classifier with around 47.8% clean accuracy). Our results effectively establish a new state-of-the-art of certifiable defense against patch attacks on CIFAR-10 and ImageNet. Code is available at https://github.com/alevine0/patchSmoothing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题