论文标题
通过对抗后门攻击,有针对性的遗忘遗忘和虚假记忆形成
Targeted Forgetting and False Memory Formation in Continual Learners through Adversarial Backdoor Attacks
论文作者
论文摘要
人工神经网络众所周知,在不断从任务序列中学习时,易于灾难性遗忘。已经提出了各种持续的(或“增量”)学习方法,以避免灾难性的遗忘,但它们通常是对手的不可知论者,即,他们不考虑造成恶意攻击的可能性。在这项工作中,我们探讨了弹性重量巩固(EWC)的脆弱性,这是一种避免灾难性遗忘的流行持续学习算法。我们表明,聪明的对手可以绕过EWC的防御,而是通过在训练过程中向模型引入少量错误信息,从而逐渐而故意忘记。我们证明了这种对手能够通过在MNIST数据集的置换和分配基准变体上注入“后门”攻击样本来控制模型的能力。重要的是,一旦模型学习了对抗性错误信息,对手就可以控制任何任务的遗忘量。等效地,恶意演员可以通过将精心设计的后门样本插入该任务的任何一部分来创建任何任务的“错误内存”。也许最具破坏力,我们表明这种脆弱性非常敏锐。通过将后门样本添加到一个任务的训练数据的1%,可以轻松地将神经网络记忆置于损害。
Artificial neural networks are well-known to be susceptible to catastrophic forgetting when continually learning from sequences of tasks. Various continual (or "incremental") learning approaches have been proposed to avoid catastrophic forgetting, but they are typically adversary agnostic, i.e., they do not consider the possibility of a malicious attack. In this effort, we explore the vulnerability of Elastic Weight Consolidation (EWC), a popular continual learning algorithm for avoiding catastrophic forgetting. We show that an intelligent adversary can bypass the EWC's defenses, and instead cause gradual and deliberate forgetting by introducing small amounts of misinformation to the model during training. We demonstrate such an adversary's ability to assume control of the model via injection of "backdoor" attack samples on both permuted and split benchmark variants of the MNIST dataset. Importantly, once the model has learned the adversarial misinformation, the adversary can then control the amount of forgetting of any task. Equivalently, the malicious actor can create a "false memory" about any task by inserting carefully-designed backdoor samples to any fraction of the test instances of that task. Perhaps most damaging, we show this vulnerability to be very acute; neural network memory can be easily compromised with the addition of backdoor samples into as little as 1% of the training data of even a single task.