具有生成模型的神经网络鲁棒性评估的指标和方法

论文标题

具有生成模型的神经网络鲁棒性评估的指标和方法

Metrics and methods for robustness evaluation of neural networks with generative models

论文作者

Buzhinsky, Igor, Nerinovsky, Arseny, Tripakis, Stavros

论文摘要

最近的研究表明，假设对手能够稍微修改其输入，那么现代的深神经网络分类器很容易愚蠢。许多论文提出了对抗性攻击，防御和方法，以衡量对这种对抗性扰动的鲁棒性。但是，最常见的对抗性示例是基于$ \ ell_p $结合在神经网络的输入空间中的扰动，而神经网络的输入空间不太可能自然出现。最近，尤其是在计算机视觉中，研究人员发现了“自然”或“语义”扰动，例如旋转，亮度变化或更高级别的变化，但是这些扰动尚未系统地用于衡量分类器的性能。在本文中，我们提出了几个指标，以测量分类器对自然对抗性示例的鲁棒性，以及评估它们的方法。这些称为潜在空间性能指标的指标基于生成模型捕获概率分布的能力，并在其潜在空间中定义。在三个图像分类案例研究中，我们评估了几个分类器的拟议指标，包括以常规和健壮方式培训的指标。我们发现，对抗性鲁棒性的潜在对应物与分类器的准确性相关，而不是其常规的对抗性鲁棒性，但是后者仍然反映在发现的潜在扰动的特性上。此外，我们发现潜在的对抗扰动的新方法表明，这些扰动通常在感知上很小。

Recent studies have shown that modern deep neural network classifiers are easy to fool, assuming that an adversary is able to slightly modify their inputs. Many papers have proposed adversarial attacks, defenses and methods to measure robustness to such adversarial perturbations. However, most commonly considered adversarial examples are based on $\ell_p$-bounded perturbations in the input space of the neural network, which are unlikely to arise naturally. Recently, especially in computer vision, researchers discovered "natural" or "semantic" perturbations, such as rotations, changes of brightness, or more high-level changes, but these perturbations have not yet been systematically utilized to measure the performance of classifiers. In this paper, we propose several metrics to measure robustness of classifiers to natural adversarial examples, and methods to evaluate them. These metrics, called latent space performance metrics, are based on the ability of generative models to capture probability distributions, and are defined in their latent spaces. On three image classification case studies, we evaluate the proposed metrics for several classifiers, including ones trained in conventional and robust ways. We find that the latent counterparts of adversarial robustness are associated with the accuracy of the classifier rather than its conventional adversarial robustness, but the latter is still reflected on the properties of found latent perturbations. In addition, our novel method of finding latent adversarial perturbations demonstrates that these perturbations are often perceptually small.

下载PDF全文

下载文献需遵守相关版权规定

论文标题