论文标题
解释解释了吗?模型最了解
Do Explanations Explain? Model Knows Best
论文作者
论文摘要
这是一个谜,输入功能有助于神经网络的输出。文献中提出了各种解释(特征归因)方法,以阐明问题。一个特殊的观察是,这些解释(归因)表明不同的特征很重要。这种现象提出了一个问题,哪种解释要信任?我们建议使用神经网络模型本身评估解释的框架。该框架利用网络生成输入功能,这些功能将特定行为强加于输出。使用生成的功能,我们设计了受控的实验设置来评估解释方法是否符合公理。因此,我们提出了一个用于解释方法的公理评估的经验框架。我们使用所提出的框架评估知名和有前途的解释解决方案。该框架提供了一个工具集来揭示现有和将来的解释解决方案中的属性和缺点。
It is a mystery which input features contribute to a neural network's output. Various explanation (feature attribution) methods are proposed in the literature to shed light on the problem. One peculiar observation is that these explanations (attributions) point to different features as being important. The phenomenon raises the question, which explanation to trust? We propose a framework for evaluating the explanations using the neural network model itself. The framework leverages the network to generate input features that impose a particular behavior on the output. Using the generated features, we devise controlled experimental setups to evaluate whether an explanation method conforms to an axiom. Thus we propose an empirical framework for axiomatic evaluation of explanation methods. We evaluate well-known and promising explanation solutions using the proposed framework. The framework provides a toolset to reveal properties and drawbacks within existing and future explanation solutions.