当对抗性例子是可以辩解的

论文标题

当对抗性例子是可以辩解的

When adversarial examples are excusable

论文作者

Kindermans, Pieter-Jan, Staats, Charles

论文摘要

神经网络在实践中的工作非常出色，从理论上讲它们可以是通用近似值。但是，它们仍然犯错误，其中一种被称为对抗性错误的特定类型似乎是人类不可原谅的。在这项工作中，我们在控制良好但高度非线性的视觉分类问题上分析了测试错误和对抗性错误。我们发现，在近似无限数据的培训时，测试错误往往接近地面真相决策边界。从定性上讲，对于人来说，这些也更加困难。相比之下，对抗性例子几乎可以在任何地方找到，通常是明显的错误。但是，当我们将对抗性示例限制为歧管时，我们会观察到对抗错误的90 \％降低。如果我们通过高斯噪声来训练歧管，我们会观察到类似的效果。在这两种情况下，其余的对抗错误往往接近地面真相决策边界。定性地，其余的对抗错误类似于困难示例中的测试错误。他们没有不可原谅的错误的习惯质量。

Neural networks work remarkably well in practice and theoretically they can be universal approximators. However, they still make mistakes and a specific type of them called adversarial errors seem inexcusable to humans. In this work, we analyze both test errors and adversarial errors on a well controlled but highly non-linear visual classification problem. We find that, when approximating training on infinite data, test errors tend to be close to the ground truth decision boundary. Qualitatively speaking these are also more difficult for a human. By contrast, adversarial examples can be found almost everywhere and are often obvious mistakes. However, when we constrain adversarial examples to the manifold, we observe a 90\% reduction in adversarial errors. If we inflate the manifold by training with Gaussian noise we observe a similar effect. In both cases, the remaining adversarial errors tend to be close to the ground truth decision boundary. Qualitatively, the remaining adversarial errors are similar to test errors on difficult examples. They do not have the customary quality of being inexcusable mistakes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题