论文标题
了解对抗性的对抗性对抗对手的例子
Understanding Adversarial Robustness Against On-manifold Adversarial Examples
论文作者
论文摘要
深层神经网络(DNN)被证明容易受到对抗例子的影响。训练有素的模型可以通过在原始数据中添加小扰动来轻松攻击。对抗性示例存在的假设之一是闭合假设:对抗性示例属于数据歧管。但是,最近的研究表明,也存在对对抗性的术语。在本文中,我们重新审视了媒介的假设,并想研究一个问题:由于术中的对抗性示例,神经网络对对抗性攻击的表现较差?由于实践中真实的数据歧管未知,因此我们考虑了在真实和合成数据集上两个近似的术中对抗示例。在实际数据集上,我们表明,在标准训练和对抗训练的模型上,术中对抗性示例的攻击率比Off-Manifold对抗示例具有更高的攻击率。从理论上讲,在综合数据集上,我们证明了术中对抗性示例是强大的,但对抗性训练专注于远离方向,而忽略了术中的对手示例。此外,我们提供分析以表明理论上也可以在实践中观察到这些属性。我们的分析表明,对术的对抗示例很重要,我们应该更加关注术中的对抗性示例,以训练强大的模型。
Deep neural networks (DNNs) are shown to be vulnerable to adversarial examples. A well-trained model can be easily attacked by adding small perturbations to the original data. One of the hypotheses of the existence of the adversarial examples is the off-manifold assumption: adversarial examples lie off the data manifold. However, recent research showed that on-manifold adversarial examples also exist. In this paper, we revisit the off-manifold assumption and want to study a question: at what level is the poor performance of neural networks against adversarial attacks due to on-manifold adversarial examples? Since the true data manifold is unknown in practice, we consider two approximated on-manifold adversarial examples on both real and synthesis datasets. On real datasets, we show that on-manifold adversarial examples have greater attack rates than off-manifold adversarial examples on both standard-trained and adversarially-trained models. On synthetic datasets, theoretically, We prove that on-manifold adversarial examples are powerful, yet adversarial training focuses on off-manifold directions and ignores the on-manifold adversarial examples. Furthermore, we provide analysis to show that the properties derived theoretically can also be observed in practice. Our analysis suggests that on-manifold adversarial examples are important, and we should pay more attention to on-manifold adversarial examples for training robust models.