论文标题
NLP中的忠实模型解释:一项调查
Towards Faithful Model Explanation in NLP: A Survey
论文作者
论文摘要
众所周知,端到端的神经自然语言处理(NLP)模型很难理解。近年来,这引起了许多努力的努力。模型解释的一种必不可少的是忠诚,即解释应该准确地代表模型预测背后的推理过程。在这项调查中,我们通过忠实的镜头回顾了NLP中110多种模型解释方法。我们首先讨论忠实的定义和评估及其对解释性的意义。然后,我们介绍了忠实解释的最新进展,将现有方法分为五类:基于相似性的方法,模型内部结构的分析,基于反向传播的方法,反事实干预和自称模型。对于每个类别,我们综合其代表性研究,优势和劣势。最后,我们总结了他们的共同美德和剩下的挑战,并反思了未来在NLP中忠实解释性的方向。
End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand. This has given rise to numerous efforts towards model explainability in recent years. One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction. In this survey, we review over 110 model explanation methods in NLP through the lens of faithfulness. We first discuss the definition and evaluation of faithfulness, as well as its significance for explainability. We then introduce recent advances in faithful explanation, grouping existing approaches into five categories: similarity-based methods, analysis of model-internal structures, backpropagation-based methods, counterfactual intervention, and self-explanatory models. For each category, we synthesize its representative studies, strengths, and weaknesses. Finally, we summarize their common virtues and remaining challenges, and reflect on future work directions towards faithful explainability in NLP.