反事实解释与对抗性示例之间的有趣关系

论文标题

反事实解释与对抗性示例之间的有趣关系

The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples

论文作者

Freiesleben, Timo

论文摘要

可以使用相同的方法来创建对抗性示例（AES）来愚弄图像分类器，以生成解释算法决策的反事实解释（CES）。该观察结果使研究人员以其他名称将CES视为AE。我们认为，与真实标签的关系以及相对于接近性的公差是正式区分CES和AE的两个属性。基于这些论点，我们在共同框架中以数学介绍了CES，AES和相关概念。此外，我们显示了当前生成CES和AE的方法之间的联系，并估计该字段将随着常见用例的数量的增长而越来越多。

The same method that creates adversarial examples (AEs) to fool image-classifiers can be used to generate counterfactual explanations (CEs) that explain algorithmic decisions. This observation has led researchers to consider CEs as AEs by another name. We argue that the relationship to the true label and the tolerance with respect to proximity are two properties that formally distinguish CEs and AEs. Based on these arguments, we introduce CEs, AEs, and related concepts mathematically in a common framework. Furthermore, we show connections between current methods for generating CEs and AEs, and estimate that the fields will merge more and more as the number of common use-cases grows.

下载PDF全文

下载文献需遵守相关版权规定

论文标题