论文标题
将对抗性训练缩放到大扰动范围
Scaling Adversarial Training to Large Perturbation Bounds
论文作者
论文摘要
深度神经网络对对抗性攻击的脆弱性促进了建立强大模型的研究。尽管大多数对抗性训练算法旨在捍卫在低幅度LP规范范围内受到限制的攻击,但实际的对手并不受这些约束的限制。在这项工作中,我们旨在实现较大范围内的对抗性鲁棒性,以抵抗可能可感知但不会改变人类(或甲骨文)预测的扰动。将甲骨文预测的图像的存在以及那些不会使其成为对抗性鲁棒性的具有挑战性的环境。我们讨论了超越感知限制的对抗性防御算法的理想目标,并进一步强调了将现有培训算法延伸到更高扰动范围的不足之处。为了克服这些缺点,我们提出了一种新颖的防御,与甲骨文一致的对抗训练(OA-AT),以使网络的预测与对抗性训练期间的甲骨文相结合。所提出的方法在巨大的Epsilon边界(例如CIFAR-110上的16/255的L-INF界限)上实现了最先进的性能,同时在标准范围(8/255)都超过了现有防御能力(AWP,Trades,PGD-AT)。
The vulnerability of Deep Neural Networks to Adversarial Attacks has fuelled research towards building robust models. While most Adversarial Training algorithms aim at defending attacks constrained within low magnitude Lp norm bounds, real-world adversaries are not limited by such constraints. In this work, we aim to achieve adversarial robustness within larger bounds, against perturbations that may be perceptible, but do not change human (or Oracle) prediction. The presence of images that flip Oracle predictions and those that do not makes this a challenging setting for adversarial robustness. We discuss the ideal goals of an adversarial defense algorithm beyond perceptual limits, and further highlight the shortcomings of naively extending existing training algorithms to higher perturbation bounds. In order to overcome these shortcomings, we propose a novel defense, Oracle-Aligned Adversarial Training (OA-AT), to align the predictions of the network with that of an Oracle during adversarial training. The proposed approach achieves state-of-the-art performance at large epsilon bounds (such as an L-inf bound of 16/255 on CIFAR-10) while outperforming existing defenses (AWP, TRADES, PGD-AT) at standard bounds (8/255) as well.