神经网络洗涤：从深神经网络中删除黑盒后门水印

论文标题

神经网络洗涤：从深神经网络中删除黑盒后门水印

Neural Network Laundering: Removing Black-Box Backdoor Watermarks from Deep Neural Networks

论文作者

Aiken, William, Kim, Hyoungshick, Woo, Simon

论文摘要

创建最先进的深度学习系统需要大量的数据，专业知识和硬件，但是嵌入神经网络的版权保护的研究受到限制。实现此类保护的主要方法之一是依靠神经网络对后门攻击的敏感性，但是这些策略的鲁棒性主要是针对修剪，微调和模型反演攻击的评估。在这项工作中，我们提出了一种神经网络“洗涤”算法，即使对手对水印的结构没有先验知识，即使对敌人没有神经网络的黑盒后门水印。我们能够有效地删除用于最近防御或版权保护机制的水印，同时分别为MNIST和CIFAR-10达到97％和80％的测试精度。对于本文介绍的所有后门水印方法，我们发现水印的鲁棒性明显弱于原始主张。我们还证明了算法在更复杂的任务中的可行性，以及在更现实的场景中，对手可以使用不到原始训练套件的1％来进行有效的洗钱攻击，这表明现有的后门水印不足以达到其索赔。

Creating a state-of-the-art deep-learning system requires vast amounts of data, expertise, and hardware, yet research into embedding copyright protection for neural networks has been limited. One of the main methods for achieving such protection involves relying on the susceptibility of neural networks to backdoor attacks, but the robustness of these tactics has been primarily evaluated against pruning, fine-tuning, and model inversion attacks. In this work, we propose a neural network "laundering" algorithm to remove black-box backdoor watermarks from neural networks even when the adversary has no prior knowledge of the structure of the watermark. We are able to effectively remove watermarks used for recent defense or copyright protection mechanisms while achieving test accuracies above 97% and 80% for both MNIST and CIFAR-10, respectively. For all backdoor watermarking methods addressed in this paper, we find that the robustness of the watermark is significantly weaker than the original claims. We also demonstrate the feasibility of our algorithm in more complex tasks as well as in more realistic scenarios where the adversary is able to carry out efficient laundering attacks using less than 1% of the original training set size, demonstrating that existing backdoor watermarks are not sufficient to reach their claims.

下载PDF全文

下载文献需遵守相关版权规定

论文标题