Elude：通过分解为标记和未标记功能来产生可解释的解释

论文标题

Elude：通过分解为标记和未标记功能来产生可解释的解释

ELUDE: Generating interpretable explanations via a decomposition into labelled and unlabelled features

论文作者

Ramaswamy, Vikram V., Kim, Sunnie S. Y., Meister, Nicole, Fong, Ruth, Russakovsky, Olga

论文摘要

在过去的十年中，深度学习模型在机器学习的不同领域取得了巨大的成功。但是，这些模型的大小和复杂性使它们难以理解。为了使它们更容易解释，最近的一些作品着重于通过人类解剖的语义属性来解释深层神经网络的部分。但是，仅使用语义属性完全解释复杂模型可能是不可能的。在这项工作中，我们建议使用一小部分无法解释的功能来增强这些属性。具体而言，我们开发了一个新颖的解释框架（通过标记和未标记分解的解释），将模型的预测分解为两个部分：一个通过语义属性的线性组合可以解释，而另一部分则取决于不可分解的特征。通过识别后者，我们能够分析模型的“无法解释的”部分，从而了解模型使用的信息。我们表明，一组未标记的功能可以推广到具有相同特征空间的多种型号，并将我们的工作与两种流行的面向属性的方法，可解释的基础分解和概念瓶颈进行比较，并讨论Elude提供的其他见解。

Deep learning models have achieved remarkable success in different areas of machine learning over the past decade; however, the size and complexity of these models make them difficult to understand. In an effort to make them more interpretable, several recent works focus on explaining parts of a deep neural network through human-interpretable, semantic attributes. However, it may be impossible to completely explain complex models using only semantic attributes. In this work, we propose to augment these attributes with a small set of uninterpretable features. Specifically, we develop a novel explanation framework ELUDE (Explanation via Labelled and Unlabelled DEcomposition) that decomposes a model's prediction into two parts: one that is explainable through a linear combination of the semantic attributes, and another that is dependent on the set of uninterpretable features. By identifying the latter, we are able to analyze the "unexplained" portion of the model, obtaining insights into the information used by the model. We show that the set of unlabelled features can generalize to multiple models trained with the same feature space and compare our work to two popular attribute-oriented methods, Interpretable Basis Decomposition and Concept Bottleneck, and discuss the additional insights ELUDE provides.

下载PDF全文

下载文献需遵守相关版权规定

论文标题