论文标题
推子:快速对抗示例拒绝
FADER: Fast Adversarial Example Rejection
论文作者
论文摘要
深度神经网络很容易受到对抗性示例的影响,即在测试时误导分类的精心制作的输入。最近的防御措施已证明可以通过检测不同层表示的合法训练样本的异常偏差来改善对抗性鲁棒性 - 这种行为通常通过对抗性攻击表现出来。尽管有技术差异,但上述所有方法都具有共同的骨干结构,我们在这项贡献中正式和强调,因为它可以帮助识别有希望的研究方向和现有方法的缺点。这项工作的第一个主要贡献是以统一框架的形式审查这些检测方法,旨在容纳现有的防御和新的防御框架。在缺点方面,审议的防御措施需要将输入样本与超大数量的参考原型进行比较,可能在不同的表示层处,从而极大地恶化了测试时间效率。此外,这种防御通常基于使用启发式方法的分类器,而不是以端到端的方式优化整个体系结构以更好地执行检测。作为这项工作的第二个主要贡献,我们介绍了Fader,这是一种用于加速基于检测方法的新技术。推子通过使用RBF网络作为检测器来克服上述问题:通过固定所需原型的数量,可以控制对抗示例的运行时复杂性检测器。与分析的MNIST数据集的检测器相比,我们的实验概述了最多73倍的原型降低,而CIFAR10数据集的最高可达50倍,而无需牺牲清洁和对抗性数据的分类精度。
Deep neural networks are vulnerable to adversarial examples, i.e., carefully-crafted inputs that mislead classification at test time. Recent defenses have been shown to improve adversarial robustness by detecting anomalous deviations from legitimate training samples at different layer representations - a behavior normally exhibited by adversarial attacks. Despite technical differences, all aforementioned methods share a common backbone structure that we formalize and highlight in this contribution, as it can help in identifying promising research directions and drawbacks of existing methods. The first main contribution of this work is the review of these detection methods in the form of a unifying framework designed to accommodate both existing defenses and newer ones to come. In terms of drawbacks, the overmentioned defenses require comparing input samples against an oversized number of reference prototypes, possibly at different representation layers, dramatically worsening the test-time efficiency. Besides, such defenses are typically based on ensembling classifiers with heuristic methods, rather than optimizing the whole architecture in an end-to-end manner to better perform detection. As a second main contribution of this work, we introduce FADER, a novel technique for speeding up detection-based methods. FADER overcome the issues above by employing RBF networks as detectors: by fixing the number of required prototypes, the runtime complexity of adversarial examples detectors can be controlled. Our experiments outline up to 73x prototypes reduction compared to analyzed detectors for MNIST dataset and up to 50x for CIFAR10 dataset respectively, without sacrificing classification accuracy on both clean and adversarial data.