论文标题

确定解释差异中脆弱性的来源:神经文本分类的案例研究

Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification

论文作者

Tang, Ruixuan, Chen, Hanjie, Ji, Yangfeng

论文摘要

当将输入侧扰动应用于模型时,一些最近的作品观察到了事后解释的不稳定。这引起了人们对事后解释的稳定的兴趣和关注。但是,其余的问题是:不稳定性是由神经网络模型还是事后解释方法引起的?这项工作探讨了导致事后不稳定解释的潜在来源。为了将影响与模型分开,我们提出了一种简单的输出概率扰动方法。与先前的输入侧扰动方法相比,输出概率扰动方法可以规避神经模型对解释的潜在影响,并允许对解释方法进行分析。我们使用三种广泛使用的事后解释方法(Lime(Limeiro等,2016),内核Shapley(Lundberg和Lee,2017a)和Sample Shapley(STRUMBELJ和KONONENKO,2010年),我们评估了提出的方法。结果表明,事后方法是稳定的,在输出概率扰动下几乎没有产生差异解释。该观察结果表明,神经网络模型可能是脆弱解释的主要来源。

Some recent works observed the instability of post-hoc explanations when input side perturbations are applied to the model. This raises the interest and concern in the stability of post-hoc explanations. However, the remaining question is: is the instability caused by the neural network model or the post-hoc explanation method? This work explores the potential source that leads to unstable post-hoc explanations. To separate the influence from the model, we propose a simple output probability perturbation method. Compared to prior input side perturbation methods, the output probability perturbation method can circumvent the neural model's potential effect on the explanations and allow the analysis on the explanation method. We evaluate the proposed method with three widely-used post-hoc explanation methods (LIME (Ribeiro et al., 2016), Kernel Shapley (Lundberg and Lee, 2017a), and Sample Shapley (Strumbelj and Kononenko, 2010)). The results demonstrate that the post-hoc methods are stable, barely producing discrepant explanations under output probability perturbations. The observation suggests that neural network models may be the primary source of fragile explanations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源