论文标题
感知对齐的梯度是否意味着对抗性的鲁棒性?
Do Perceptually Aligned Gradients Imply Adversarial Robustness?
论文作者
论文摘要
对抗性稳健的分类器具有非体模型所没有的特征 - 感知对齐梯度(PAG)。他们相对于输入与人类感知良好的梯度。几项作品已经将PAG确定为强大训练的副产品,但没有人认为它是独立现象,也没有研究其自身的含义。在这项工作中,我们专注于这个特征,并测试\ emph {感知对齐的梯度暗示稳健性}。为此,我们开发了一个新颖的目标,可以直接在训练分类器中促进PAG,并检查具有此类梯度的模型是否对对抗性攻击更强大。在多个数据集和体系结构上进行了广泛的实验来验证具有排列梯度的模型表现出明显的鲁棒性,从而揭示了PAG和稳健性之间令人惊讶的双向连接。最后,我们表明,更好的梯度对准会导致鲁棒性和利用这种观察,从而提高了现有的对抗训练技术的鲁棒性。
Adversarially robust classifiers possess a trait that non-robust models do not -- Perceptually Aligned Gradients (PAG). Their gradients with respect to the input align well with human perception. Several works have identified PAG as a byproduct of robust training, but none have considered it as a standalone phenomenon nor studied its own implications. In this work, we focus on this trait and test whether \emph{Perceptually Aligned Gradients imply Robustness}. To this end, we develop a novel objective to directly promote PAG in training classifiers and examine whether models with such gradients are more robust to adversarial attacks. Extensive experiments on multiple datasets and architectures validate that models with aligned gradients exhibit significant robustness, exposing the surprising bidirectional connection between PAG and robustness. Lastly, we show that better gradient alignment leads to increased robustness and harness this observation to boost the robustness of existing adversarial training techniques.