论文标题
将沙普利价值用于模型解释的预测和因果关系
Predictive and Causal Implications of using Shapley Value for Model Interpretation
论文作者
论文摘要
沙普利价值是游戏理论的概念。最近,它用于解释机器学习技术产生的复杂模型。尽管Shapley值的数学定义是直接的,但将其用作模型解释工具的含义尚待描述。在当前论文中,我们分析了贝叶斯网络框架中的沙普利价值。我们确定了沙普利价值与条件独立性之间的关系,这是预测性和因果建模中的关键概念。我们的结果表明,从模型中消除具有高沙普利价值的变量并不一定会损害预测性能,而消除模型低shapley值的变量可能会损害性能。因此,在一般情况下,使用沙普利值进行特征选择不会导致最简约和预测性的最佳模型。更重要的是,变量的沙普利价值不能反映其与感兴趣目标的因果关系。
Shapley value is a concept from game theory. Recently, it has been used for explaining complex models produced by machine learning techniques. Although the mathematical definition of Shapley value is straight-forward, the implication of using it as a model interpretation tool is yet to be described. In the current paper, we analyzed Shapley value in the Bayesian network framework. We established the relationship between Shapley value and conditional independence, a key concept in both predictive and causal modeling. Our results indicate that, eliminating a variable with high Shapley value from a model do not necessarily impair predictive performance, whereas eliminating a variable with low Shapley value from a model could impair performance. Therefore, using Shapley value for feature selection do not result in the most parsimonious and predictively optimal model in the general case. More importantly, Shapley value of a variable do not reflect their causal relationship with the target of interest.