论文标题
机器学习解释评估指标的统一研究
A Unified Study of Machine Learning Explanation Evaluation Metrics
论文作者
论文摘要
对可信赖的机器学习的需求日益增长,导致了可解释性研究的开花。已经开发了许多解释方法来实现此目的。但是,这些方法是缺陷和不当评估的。研究人员将许多现有的解释指标作为其提议的解释技术的副产品引入,以证明其方法的优势。尽管被广泛使用,但它们或多或少被指控犯有问题。我们声称缺乏公认和正当的指标会导致混乱在基准这些解释方法时 - 当指标获得高/低分时,我们真的有好/坏解释吗?我们将现有指标分为两类,并证明它们不足以适当评估解释,原因有多种原因。我们建议在处理评估机器学习解释方面的问题方面的指南,并鼓励研究人员在开发解释技术和指标时仔细处理这些问题。
The growing need for trustworthy machine learning has led to the blossom of interpretability research. Numerous explanation methods have been developed to serve this purpose. However, these methods are deficiently and inappropriately evaluated. Many existing metrics for explanations are introduced by researchers as by-products of their proposed explanation techniques to demonstrate the advantages of their methods. Although widely used, they are more or less accused of problems. We claim that the lack of acknowledged and justified metrics results in chaos in benchmarking these explanation methods -- Do we really have good/bad explanation when a metric gives a high/low score? We split existing metrics into two categories and demonstrate that they are insufficient to properly evaluate explanations for multiple reasons. We propose guidelines in dealing with the problems in evaluating machine learning explanation and encourage researchers to carefully deal with these problems when developing explanation techniques and metrics.