论文标题
悲伤:针对对抗性例子的显着防御
SAD: Saliency-based Defenses Against Adversarial Examples
论文作者
论文摘要
随着机器和深度学习模型的普及程度的提高,人们对其对恶意投入的脆弱性有了越来越多的重点。这些对抗性示例漂移模型从网络的最初意图中进行了预测,并且在实际安全方面越来越关注。为了对抗这些攻击,神经网络可以利用传统的图像处理方法或最先进的防御模型来减少数据中的扰动。采用全球减少降噪方法的防御方法有效地防止对抗性攻击,但是它们的有损方法通常会扭曲图像中的重要数据。在这项工作中,我们提出了一种基于视觉显着性的方法来清洁受对抗攻击影响的数据。我们的模型利用了对抗图像的显着区域,以提供目标的对策,同时相对减少清洁图像中的损失。我们通过在攻击,攻击下和应用清洁方法之前评估最新显着性方法的有效性来衡量模型的准确性。与相关的防御措施相比,我们证明了我们提出的方法的有效性,并在两个显着性数据集中与已建立的对抗攻击方法相比。与传统和最新方法相比,我们的目标方法显示出一系列标准统计和距离显着性指标的显着改善。
With the rise in popularity of machine and deep learning models, there is an increased focus on their vulnerability to malicious inputs. These adversarial examples drift model predictions away from the original intent of the network and are a growing concern in practical security. In order to combat these attacks, neural networks can leverage traditional image processing approaches or state-of-the-art defensive models to reduce perturbations in the data. Defensive approaches that take a global approach to noise reduction are effective against adversarial attacks, however their lossy approach often distorts important data within the image. In this work, we propose a visual saliency based approach to cleaning data affected by an adversarial attack. Our model leverages the salient regions of an adversarial image in order to provide a targeted countermeasure while comparatively reducing loss within the cleaned images. We measure the accuracy of our model by evaluating the effectiveness of state-of-the-art saliency methods prior to attack, under attack, and after application of cleaning methods. We demonstrate the effectiveness of our proposed approach in comparison with related defenses and against established adversarial attack methods, across two saliency datasets. Our targeted approach shows significant improvements in a range of standard statistical and distance saliency metrics, in comparison with both traditional and state-of-the-art approaches.