对深度学习模型的背景数据大小对Shapley添加说明（SHAP）稳定性影响的影响的实证研究

论文标题

对深度学习模型的背景数据大小对Shapley添加说明（SHAP）稳定性影响的影响的实证研究

An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models

论文作者

Yuan, Han, Liu, Mingxuan, Kang, Lican, Miao, Chenkui, Wu, Ying

论文摘要

如今，解释机器学习（ML）模型使某些推论与此类推论的准确性至关重要。一些ML模型（例如决策树）具有固有的解释性，可以直接被人类理解。然而，其他人则像人工神经网络（ANN）一样依靠外部方法来揭示扣除机制。 Shapley添加说明（SHAP）就是这样的外部方法之一，在解释ANN时需要一个背景数据集。通常，背景数据集由从培训数据集随机采样的实例组成。但是，采样大小及其对外形的影响仍未探索。在我们对模拟III数据集的实证研究中，我们表明，当使用从随机抽样中获取的不同背景数据集时，这两个核心解释和可变排名会波动，这表明用户不能毫无疑问地信任从形状中的单次解释。幸运的是，这种波动随着背景数据集大小的增加而降低。此外，我们注意到对形状变量排名的稳定性评估的U形，这表明与中等重要的变量相比，Shap在对最重要和最不重要的变量进行排名方面更为可靠。总体而言，我们的结果表明，用户应考虑背景数据如何影响形状结果，并且随着背景样本量的增加，形状稳定性的提高。

Nowadays, the interpretation of why a machine learning (ML) model makes certain inferences is as crucial as the accuracy of such inferences. Some ML models like the decision tree possess inherent interpretability that can be directly comprehended by humans. Others like artificial neural networks (ANN), however, rely on external methods to uncover the deduction mechanism. SHapley Additive exPlanations (SHAP) is one of such external methods, which requires a background dataset when interpreting ANNs. Generally, a background dataset consists of instances randomly sampled from the training dataset. However, the sampling size and its effect on SHAP remain to be unexplored. In our empirical study on the MIMIC-III dataset, we show that the two core explanations - SHAP values and variable rankings fluctuate when using different background datasets acquired from random sampling, indicating that users cannot unquestioningly trust the one-shot interpretation from SHAP. Luckily, such fluctuation decreases with the increase of the background dataset size. Also, we notice an U-shape in the stability assessment of SHAP variable rankings, demonstrating that SHAP is more reliable in ranking the most and least important variables compared to moderately important ones. Overall, our results suggest that users should take into account how background data affects SHAP results, with improved SHAP stability as the background sample size increases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题