论文标题
不可察觉的后门攻击:从输入空间到功能表示
Imperceptible Backdoor Attack: From Input Space to Feature Representation
论文作者
论文摘要
后门攻击是对深度神经网络(DNNS)的迅速发展的威胁。在后门攻击方案中,攻击者通常通过操纵培训数据集或训练过程将后门植入目标模型。然后,被妥协的模型通常对良性输入行为,但在出现预定义的触发器时会造成错误。在本文中,我们分析了现有攻击方法的缺点,并提出了一种新颖的不可察觉的后门攻击。我们将触发模式视为多项式分布后的一种特殊的噪声。使用基于U-NET的网络来生成每个良性输入的多项式分布的混凝土参数。这种详细的触发因素可确保我们的方法对人类和统计检测都是看不见的。除了设计触发器外,我们还考虑了我们的方法对基于模型诊断的防御的鲁棒性。我们强迫用触发器盖章的恶意输入的特征表示,以纠缠于良性。我们通过广泛的数据集和网络证明了针对多个最新防御的有效性和鲁棒性。我们的触发器仅修改良性图像的1 \%像素,而修改幅度为1。我们的源代码可在https://github.com/ekko-zn/ijcai202222-backdoor上找到。
Backdoor attacks are rapidly emerging threats to deep neural networks (DNNs). In the backdoor attack scenario, attackers usually implant the backdoor into the target model by manipulating the training dataset or training process. Then, the compromised model behaves normally for benign input yet makes mistakes when the pre-defined trigger appears. In this paper, we analyze the drawbacks of existing attack approaches and propose a novel imperceptible backdoor attack. We treat the trigger pattern as a special kind of noise following a multinomial distribution. A U-net-based network is employed to generate concrete parameters of multinomial distribution for each benign input. This elaborated trigger ensures that our approach is invisible to both humans and statistical detection. Besides the design of the trigger, we also consider the robustness of our approach against model diagnose-based defences. We force the feature representation of malicious input stamped with the trigger to be entangled with the benign one. We demonstrate the effectiveness and robustness against multiple state-of-the-art defences through extensive datasets and networks. Our trigger only modifies less than 1\% pixels of a benign image while the modification magnitude is 1. Our source code is available at https://github.com/Ekko-zn/IJCAI2022-Backdoor.