学术文献库

论文标题

黑匣子模型免受对抗攻击的无数据防御

Data-free Defense of Black Box Models Against Adversarial Attacks

论文作者

Nayak, Gaurav Kumar, Khatri, Inder, Rawal, Ruchit, Chakraborty, Anirban

论文摘要

几家公司经常通过仅通过API将其作为黑匣子将其暴露为黑匣子，从而从第三方用户中保护经过训练的深层模型（即建筑的细节，学习的权重，培训细节等）。此外，由于专有的原因或敏感性问题，它们甚至可能无法提供访问培训数据的访问。在这项工作中，我们为黑匣子模型提出了一种新颖的防御机制，以防止在无数据设置中进行对抗攻击。我们通过生成模型构建合成数据，并使用模型窃取技术来构建替代网络。为了最大程度地减少对扰动样品的对抗性污染，我们提出了“小波噪声去除剂”（WNR），该（WNR）在输入图像上执行离散的小波分解，并仅仔细选择由我们的“小波系数选择模块”确定的少数重要系数（WCSM）。为了通过WNR清除噪声后恢复图像的高频内容，我们进一步训练了一个“再生器”网络，其目标是检索系数，以使重建的图像产生类似于替代模型上的原始预测。在测试时，将WNR与训练有素的再生网络结合使用到黑匣子网络上，从而极大地提高了对抗精度。与基线相比，我们的方法将CIFAR-10上的对抗精度提高了38.98％和最先进的汽车攻击的32.01％，即使攻击者使用替代体系结构（Alexnet-Half和Alexnet）与Black Boxnet（Alexnet（Alexnet）相似）具有与防御者一样的型模型窃取策略。该代码可从https://github.com/vcl-iisc/data-free-box-box-defense获得

Several companies often safeguard their trained deep models (i.e., details of architecture, learnt weights, training details etc.) from third-party users by exposing them only as black boxes through APIs. Moreover, they may not even provide access to the training data due to proprietary reasons or sensitivity concerns. In this work, we propose a novel defense mechanism for black box models against adversarial attacks in a data-free set up. We construct synthetic data via generative model and train surrogate network using model stealing techniques. To minimize adversarial contamination on perturbed samples, we propose 'wavelet noise remover' (WNR) that performs discrete wavelet decomposition on input images and carefully select only a few important coefficients determined by our 'wavelet coefficient selection module' (WCSM). To recover the high-frequency content of the image after noise removal via WNR, we further train a 'regenerator' network with an objective to retrieve the coefficients such that the reconstructed image yields similar to original predictions on the surrogate model. At test time, WNR combined with trained regenerator network is prepended to the black box network, resulting in a high boost in adversarial accuracy. Our method improves the adversarial accuracy on CIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared to baseline, even when the attacker uses surrogate architecture (Alexnet-half and Alexnet) similar to the black box architecture (Alexnet) with same model stealing strategy as defender. The code is available at https://github.com/vcl-iisc/data-free-black-box-defense