论文标题
微调是减轻后门攻击所需的全部
Fine-Tuning Is All You Need to Mitigate Backdoor Attacks
论文作者
论文摘要
后门攻击是对机器学习模型的主要威胁之一。为了减轻后门而做出了各种努力。但是,现有的防御措施变得越来越复杂,通常需要高度的计算资源,或者可能会危害模型的实用程序。在这项工作中,我们表明,微调是最常见且易于使用的机器学习培训操作之一,可以有效地从机器学习模型中删除后门,同时维护高模型实用程序。在三个机器学习范式上进行的广泛实验表明,微调和我们新提议的超级调整可以实现强大的防御性能。此外,我们创造了一个新的术语,即后门续集,以测量删除后门之前和之后对其他攻击的模型漏洞的变化。经验评估表明,与其他防御方法相比,超细调整叶子有限的后门后遗症。我们希望我们的结果可以帮助机器学习模型所有者更好地保护自己的模型免受后门威胁。此外,它要求设计更高级的攻击,以便全面评估机器学习模型的后门漏洞。
Backdoor attacks represent one of the major threats to machine learning models. Various efforts have been made to mitigate backdoors. However, existing defenses have become increasingly complex and often require high computational resources or may also jeopardize models' utility. In this work, we show that fine-tuning, one of the most common and easy-to-adopt machine learning training operations, can effectively remove backdoors from machine learning models while maintaining high model utility. Extensive experiments over three machine learning paradigms show that fine-tuning and our newly proposed super-fine-tuning achieve strong defense performance. Furthermore, we coin a new term, namely backdoor sequela, to measure the changes in model vulnerabilities to other attacks before and after the backdoor has been removed. Empirical evaluation shows that, compared to other defense methods, super-fine-tuning leaves limited backdoor sequela. We hope our results can help machine learning model owners better protect their models from backdoor threats. Also, it calls for the design of more advanced attacks in order to comprehensively assess machine learning models' backdoor vulnerabilities.