目光：全球至本地建筑与基于概念的解释

论文标题

目光：全球至本地建筑与基于概念的解释

GLANCE: Global to Local Architecture-Neutral Concept-based Explanations

论文作者

Kori, Avinash, Glocker, Ben, Toni, Francesca

论文摘要

当前的大多数解释性技术都集中在捕获输入空间中特征的重要性。但是，鉴于模型和数据生成过程的复杂性，由此产生的解释远非“完整”，因为它们缺乏特征相互作用的指示和其“效应”的可视化。在这项工作中，我们提出了一个新颖的双流式解释性框架，以解释任何基于CNN的图像分类器（无论架构如何）做出的决定。为此，我们首先将潜在特征从分类器中删除，然后将这些功能与观察到的/人为定义的“上下文”功能保持一致。这些对齐特征形成了具有语义上有意义的概念，用于提取描述“感知”数据生成过程的因果图，描述了未观察到的潜在特征和观察到的“上下文”特征之间的功能间和内部内部和内部内部相互作用。该因果图是一个全局模型，可以从中提取不同形式的局部解释。具体而言，我们提供了一个生成器来可视化潜在空间中特征之间交互的“效果”，并从其作为局部解释中提取特征的重要性。我们的框架利用对抗性知识蒸馏来忠实地从分类器的潜在空间中学习表示形式，并将其用于提取视觉解释。我们使用带有附加正规化术语的stylegan-v2体系结构来执行分解和对齐。我们证明并评估了通过关于形态含量和FFHQ人脸数据集获得的框架获得的解释。我们的框架可在\ url {https://github.com/koriavinash1/glance-explanations}中获得。

Most of the current explainability techniques focus on capturing the importance of features in input space. However, given the complexity of models and data-generating processes, the resulting explanations are far from being `complete', in that they lack an indication of feature interactions and visualization of their `effect'. In this work, we propose a novel twin-surrogate explainability framework to explain the decisions made by any CNN-based image classifier (irrespective of the architecture). For this, we first disentangle latent features from the classifier, followed by aligning these features to observed/human-defined `context' features. These aligned features form semantically meaningful concepts that are used for extracting a causal graph depicting the `perceived' data-generating process, describing the inter- and intra-feature interactions between unobserved latent features and observed `context' features. This causal graph serves as a global model from which local explanations of different forms can be extracted. Specifically, we provide a generator to visualize the `effect' of interactions among features in latent space and draw feature importance therefrom as local explanations. Our framework utilizes adversarial knowledge distillation to faithfully learn a representation from the classifiers' latent space and use it for extracting visual explanations. We use the styleGAN-v2 architecture with an additional regularization term to enforce disentanglement and alignment. We demonstrate and evaluate explanations obtained with our framework on Morpho-MNIST and on the FFHQ human faces dataset. Our framework is available at \url{https://github.com/koriavinash1/GLANCE-Explanations}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题