迈向构图对抗性鲁棒性：将对抗性训练概括为复合语义扰动

论文标题

迈向构图对抗性鲁棒性：将对抗性训练概括为复合语义扰动

Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations

论文作者

Hsiung, Lei, Tsai, Yun-Yun, Chen, Pin-Yu, Ho, Tsung-Yi

论文摘要

对单个扰动类型（例如$ \ ell_ {p} $ - norm-norm的对抗性示例）的模型鲁棒性已得到广泛研究，但其对更现实的场景的概括涉及多种语义扰动及其组成，其构成仍然很大程度上尚未得到开发。在本文中，我们首先提出了一种新的方法来生成复合对抗性示例。我们的方法可以通过利用组件的投影梯度下降和自动攻击顺序调度来找到最佳攻击组合。然后，我们建议广义的对抗训练（GAT）将模型鲁棒性从$ \ ell_ {p} $ - 球扩展到复合语义扰动，例如色相的组合，饱和度，亮度，亮度，对比度和旋转。使用Imagenet和CIFAR-10数据集获得的结果表明，GAT不仅可以对单个攻击的所有测试类型，而且对此类攻击的任何组合都可以稳健。 GAT还优于基线$ \ ell _ {\ infty} $ - 标准边界对抗训练方法的大幅度余量。

Model robustness against adversarial examples of single perturbation type such as the $\ell_{p}$-norm has been widely studied, yet its generalization to more realistic scenarios involving multiple semantic perturbations and their composition remains largely unexplored. In this paper, we first propose a novel method for generating composite adversarial examples. Our method can find the optimal attack composition by utilizing component-wise projected gradient descent and automatic attack-order scheduling. We then propose generalized adversarial training (GAT) to extend model robustness from $\ell_{p}$-ball to composite semantic perturbations, such as the combination of Hue, Saturation, Brightness, Contrast, and Rotation. Results obtained using ImageNet and CIFAR-10 datasets indicate that GAT can be robust not only to all the tested types of a single attack, but also to any combination of such attacks. GAT also outperforms baseline $\ell_{\infty}$-norm bounded adversarial training approaches by a significant margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题