通过可逆神经网络探索对抗性示例

论文标题

通过可逆神经网络探索对抗性示例

Exploring Adversarial Examples via Invertible Neural Networks

论文作者

Bai, Ruqi, Bagchi, Saurabh, Inouye, David I.

论文摘要

对抗性示例（AES）是通过将轻微扰动引入原始图像来误导深神经网络（DNN）分类器的图像。近年来，这种安全脆弱性导致了广泛的研究，因为它可以将现实世界的威胁引入依赖神经网络的系统。然而，对对抗性例子的特征的深刻理解仍然难以捉摸。我们提出了一种通过最近的发展，即具有Lipschitz连续映射函数从输入到输出的连续映射函数的新方法来实现这种理解的新方法。有了能够将任何潜在表示回到其相应的输入图像的能力，我们可以在更深层次的层次上研究对抗示例，并删除对抗性示例的潜在表示。鉴于这种新的观点，我们提出了一种快速的潜在空间对抗示例生成方法，可以加速对抗训练。此外，这种新的观点可能有助于新的对抗示例检测方式。

Adversarial examples (AEs) are images that can mislead deep neural network (DNN) classifiers via introducing slight perturbations into original images. This security vulnerability has led to vast research in recent years because it can introduce real-world threats into systems that rely on neural networks. Yet, a deep understanding of the characteristics of adversarial examples has remained elusive. We propose a new way of achieving such understanding through a recent development, namely, invertible neural models with Lipschitz continuous mapping functions from the input to the output. With the ability to invert any latent representation back to its corresponding input image, we can investigate adversarial examples at a deeper level and disentangle the adversarial example's latent representation. Given this new perspective, we propose a fast latent space adversarial example generation method that could accelerate adversarial training. Moreover, this new perspective could contribute to new ways of adversarial example detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题