有效的针对对抗性自我监督学习的目标攻击

论文标题

有效的针对对抗性自我监督学习的目标攻击

Effective Targeted Attacks for Adversarial Self-Supervised Learning

论文作者

Kim, Minseon, Ha, Hyeonjeong, Son, Sooel, Hwang, Sung Ju

论文摘要

最近，无监督的对抗训练（AT）被强调为在没有任何标签信息的情况下实现稳健性的一种手段。先前对无监督的研究主要集中于实施自我监督的学习（SSL）框架，从而最大程度地提高了实例分类损失，以产生对抗性示例。但是，我们观察到，只需通过未经定位的对抗性攻击最大化自我监督的训练损失通常会导致产生无效的对手，这些对手可能无法帮助改善受过训练的模型的鲁棒性，尤其是对于非对抗性的SSL框架而言，没有负面示例。为了解决这个问题，我们为有针对性的对抗性攻击提出了一种新颖的积极挖掘，以产生对对抗性SSL框架的有效对手。具体而言，我们介绍了一种算法，该算法根据熵和相似性选择了给定实例最令人困惑但相似的目标示例，然后随后将给定实例归功于所选目标。我们的方法表明，当应用于非对比度SSL框架时，鲁棒性的显着增强，并且在基准数据集上使用对比度SSL框架进行了更少但一致的鲁棒性改进。

Recently, unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information. Previous studies in unsupervised AT have mostly focused on implementing self-supervised learning (SSL) frameworks, which maximize the instance-wise classification loss to generate adversarial examples. However, we observe that simply maximizing the self-supervised training loss with an untargeted adversarial attack often results in generating ineffective adversaries that may not help improve the robustness of the trained model, especially for non-contrastive SSL frameworks without negative examples. To tackle this problem, we propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks. Specifically, we introduce an algorithm that selects the most confusing yet similar target example for a given instance based on entropy and similarity, and subsequently perturbs the given instance towards the selected target. Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks, on the benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题