可解释的AI中的可解释表示：从理论到实践

论文标题

可解释的AI中的可解释表示：从理论到实践

Interpretable Representations in Explainable AI: From Theory to Practice

论文作者

Sokol, Kacper, Flach, Peter

论文摘要

可解释的表示是许多基于人工智能和机器学习算法的黑箱预测系统的解释器的骨干。他们将良好预测性能所需的低级数据表示转化为用于传达解释性见解的高级人类无能的概念。值得注意的是，解释类型及其认知复杂性直接由可解释的表示形式控制，这可以针对特定的受众和用例。但是，许多基于可解释的表示的解释者忽略了它们的优点，并依靠经常带有隐式假设的默认解决方案，从而降低了此类技术的解释力和可靠性。为了解决这个问题，我们研究了可解释的表示的属性，这些属性编码了人类可读概念的存在和不存在。我们演示了它们如何用于表格，图像和文本数据的操作；讨论他们的假设，优势和劣势；识别其核心构建块；并仔细检查其配置和参数化。特别是，这种深入的分析使我们能够在表格数据的上下文中查明其解释性属性，Desiderata和（恶意）操纵的范围，在该数据的上下文中，线性模型用于量化可解释概念对黑盒预测的影响。我们的发现导致了设计可信赖的可解释表示的一系列建议；具体而言，表格数据的班级感知（有监督）离散的好处，例如，决策树以及图像可解释表示对分割粒度和遮挡颜色的敏感性。

Interpretable representations are the backbone of many explainers that target black-box predictive systems based on artificial intelligence and machine learning algorithms. They translate the low-level data representation necessary for good predictive performance into high-level human-intelligible concepts used to convey the explanatory insights. Notably, the explanation type and its cognitive complexity are directly controlled by the interpretable representation, tweaking which allows to target a particular audience and use case. However, many explainers built upon interpretable representations overlook their merit and fall back on default solutions that often carry implicit assumptions, thereby degrading the explanatory power and reliability of such techniques. To address this problem, we study properties of interpretable representations that encode presence and absence of human-comprehensible concepts. We demonstrate how they are operationalised for tabular, image and text data; discuss their assumptions, strengths and weaknesses; identify their core building blocks; and scrutinise their configuration and parameterisation. In particular, this in-depth analysis allows us to pinpoint their explanatory properties, desiderata and scope for (malicious) manipulation in the context of tabular data where a linear model is used to quantify the influence of interpretable concepts on a black-box prediction. Our findings lead to a range of recommendations for designing trustworthy interpretable representations; specifically, the benefits of class-aware (supervised) discretisation of tabular data, e.g., with decision trees, and sensitivity of image interpretable representations to segmentation granularity and occlusion colour.

下载PDF全文

下载文献需遵守相关版权规定

论文标题