论文标题
图像空间中的稀疏视觉反事实解释
Sparse Visual Counterfactual Explanations in Image Space
论文作者
论文摘要
图像空间中的视觉反事实解释(VCE)是了解图像分类器的决策的重要工具,因为它们在哪些图像的更改下会改变分类器的决定会改变。他们在图像空间中的产生具有挑战性,由于对抗性例子的问题,需要强大的模型。在图像空间中生成VCE的现有技术遭受背景虚假变化的影响。我们对VCE的新型扰动模型以及通过我们的新型自动 - 弗兰克 - 摩 - 摩托方案的有效优化产生了稀疏的VCE,从而导致了针对目标类别的细微变化。此外,我们表明,由于Imagenet数据集中的虚假特征,VCE可用于检测Imagenet分类器的不希望的行为。
Visual counterfactual explanations (VCEs) in image space are an important tool to understand decisions of image classifiers as they show under which changes of the image the decision of the classifier would change. Their generation in image space is challenging and requires robust models due to the problem of adversarial examples. Existing techniques to generate VCEs in image space suffer from spurious changes in the background. Our novel perturbation model for VCEs together with its efficient optimization via our novel Auto-Frank-Wolfe scheme yields sparse VCEs which lead to subtle changes specific for the target class. Moreover, we show that VCEs can be used to detect undesired behavior of ImageNet classifiers due to spurious features in the ImageNet dataset.