事后概念解释何时可识别？

论文标题

事后概念解释何时可识别？

When are Post-hoc Conceptual Explanations Identifiable?

论文作者

Leemann, Tobias, Kirchhof, Michael, Rong, Yao, Kasneci, Enkelejda, Kasneci, Gjergji

论文摘要

通过概念解释对理解和分解学习空间的兴趣正在稳步增长。当没有人类概念标签可用时，概念发现方法搜索了经过培训的嵌入空间，以示为对象形状或颜色，可以为决策提供事后解释。与以前的工作不同，我们认为概念发现应该是可识别的，这意味着可以证明许多已知的概念可以恢复以确保解释的可靠性。作为起点，我们明确地在概念发现与经典方法（例如主组件分析和独立组件分析）之间进行了联系，表明它们可以在非高斯分布中恢复独立的概念。对于依赖概念，我们提出了两种利用图像生成过程的功能组成特性的新颖方法。我们可证明的可识别的概念发现方法在包括数百种训练有素的模型和依赖概念在内的一系列实验上大大优于竞争对手，在这些实验中，它们与地面真相的一致性高达29％。我们的结果强调了可以保证没有人类标签的可靠概念发现的严格条件，并为域名提供正式的基础。我们的代码可在线提供。

Interest in understanding and factorizing learned embedding spaces through conceptual explanations is steadily growing. When no human concept labels are available, concept discovery methods search trained embedding spaces for interpretable concepts like object shape or color that can provide post-hoc explanations for decisions. Unlike previous work, we argue that concept discovery should be identifiable, meaning that a number of known concepts can be provably recovered to guarantee reliability of the explanations. As a starting point, we explicitly make the connection between concept discovery and classical methods like Principal Component Analysis and Independent Component Analysis by showing that they can recover independent concepts under non-Gaussian distributions. For dependent concepts, we propose two novel approaches that exploit functional compositionality properties of image-generating processes. Our provably identifiable concept discovery methods substantially outperform competitors on a battery of experiments including hundreds of trained models and dependent concepts, where they exhibit up to 29 % better alignment with the ground truth. Our results highlight the strict conditions under which reliable concept discovery without human labels can be guaranteed and provide a formal foundation for the domain. Our code is available online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题