论文标题
并非所有的毒物都平等地创造了:针对数据中毒的强大训练
Not All Poisons are Created Equal: Robust Training against Data Poisoning
论文作者
论文摘要
数据中毒通过在培训数据中注入恶性制作的样本来导致测试时间目标示例的错误分类。现有防御通常仅对特定类型的靶向攻击有效,大大降低了概括性能,或者对于标准深度学习管道而言是过敏的。 在这项工作中,我们提出了一种有效的防御机制,可大大降低各种数据中毒攻击的成功率,并为模型的性能提供理论保证。有针对性的攻击通过将有限的扰动添加到随机选择的训练数据子集中以匹配目标的梯度或表示形式来起作用。我们表明:(i)在有界的扰动下,只能优化许多毒药,以使梯度与目标的梯度足够接近并使攻击成功; (ii)这种有效的毒药摆脱了原始班级,并在梯度空间中孤立; (iii)在训练期间,在低密度梯度区域中放下例子可以成功消除有效的毒物,并保证与完整数据培训相似的训练动态。我们的广泛实验表明,我们的方法大大降低了最先进的目标攻击的成功率,包括梯度匹配和靶心多层,并很容易扩展到大型数据集。
Data poisoning causes misclassification of test time target examples by injecting maliciously crafted samples in the training data. Existing defenses are often effective only against a specific type of targeted attack, significantly degrade the generalization performance, or are prohibitive for standard deep learning pipelines. In this work, we propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks, and provides theoretical guarantees for the performance of the model. Targeted attacks work by adding bounded perturbations to a randomly selected subset of training data to match the targets' gradient or representation. We show that: (i) under bounded perturbations, only a number of poisons can be optimized to have a gradient that is close enough to that of the target and make the attack successful; (ii) such effective poisons move away from their original class and get isolated in the gradient space; (iii) dropping examples in low-density gradient regions during training can successfully eliminate the effective poisons, and guarantees similar training dynamics to that of training on full data. Our extensive experiments show that our method significantly decreases the success rate of state-of-the-art targeted attacks, including Gradient Matching and Bullseye Polytope, and easily scales to large datasets.