通过匹配预测分布来对抗检测和校正

论文标题

通过匹配预测分布来对抗检测和校正

Adversarial Detection and Correction by Matching Prediction Distributions

论文作者

Vacanti, Giovanni, Van Looveren, Arnaud

论文摘要

我们为机器学习分类器提供了一种新颖的对抗检测和校正方法。该检测器由一个自动编码器组成，该自动编码器基于基于kullback-leibler的自定义损失函数的培训，在原始和重建实例上的分类器预测之间的差异。该方法不受欢迎，易于培训，不需要培训，并且不需要任何知识攻击。该探测器几乎完全中和诸如Carlini-Wagner或Slide对Mnist和Fashion-Mnist的幻灯片，并且当攻击被授予完全访问分类模型但不能防御的情况下，在CIFAR-10上仍然非常有效。我们表明，如果白框攻击，攻击者对模型和防御有充分的了解并研究攻击的鲁棒性。该方法非常灵活，也可以用于检测常见数据损坏和扰动，从而对模型性能产生负面影响。我们在CIFAR-10-C数据集上说明了此功能。

We present a novel adversarial detection and correction method for machine learning classifiers.The detector consists of an autoencoder trained with a custom loss function based on the Kullback-Leibler divergence between the classifier predictions on the original and reconstructed instances.The method is unsupervised, easy to train and does not require any knowledge about the underlying attack. The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST, and remains very effective on CIFAR-10 when the attack is granted full access to the classification model but not the defence. We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence and investigate the robustness of the attack. The method is very flexible and can also be used to detect common data corruptions and perturbations which negatively impact the model performance. We illustrate this capability on the CIFAR-10-C dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题