论文标题
L-RED:有效的训练后检测不可察觉的后门攻击,无需访问训练集
L-RED: Efficient Post-Training Detection of Imperceptible Backdoor Attacks without Access to the Training Set
论文作者
论文摘要
后门攻击(BAS)是对对抗性攻击的一种新兴形式,通常是针对深神经网络图像分类器的。攻击者的目标是让分类器学会在一个或多个源类中的测试图像包含后门模式时学习为目标类分类,同时在所有干净的测试图像上保持高精度。基于反向工程的防御(REDS)针对BAS不需要访问训练集,而只需要访问独立的清洁数据集。不幸的是,大多数现有的红色都依赖一个不切实际的假设,即除了目标类别以外的所有类都是攻击的源类。不依赖这个假设的红色通常需要大量的干净图像和重型计算。在本文中,我们提出了一个基于拉格朗日的红色(L-RED),该红色(L-RED)不需要了解源类的数量(或是否存在攻击)。我们的防御需要很少的干净图像才能有效检测BAS,并且在计算上是有效的。值得注意的是,在CIFAR-10的实验中,我们仅使用每个类别的每隔两个干净的图像检测到56个BAS中的56个BAS。
Backdoor attacks (BAs) are an emerging form of adversarial attack typically against deep neural network image classifiers. The attacker aims to have the classifier learn to classify to a target class when test images from one or more source classes contain a backdoor pattern, while maintaining high accuracy on all clean test images. Reverse-Engineering-based Defenses (REDs) against BAs do not require access to the training set but only to an independent clean dataset. Unfortunately, most existing REDs rely on an unrealistic assumption that all classes except the target class are source classes of the attack. REDs that do not rely on this assumption often require a large set of clean images and heavy computation. In this paper, we propose a Lagrangian-based RED (L-RED) that does not require knowledge of the number of source classes (or whether an attack is present). Our defense requires very few clean images to effectively detect BAs and is computationally efficient. Notably, we detect 56 out of 60 BAs using only two clean images per class in our experiments on CIFAR-10.