论文标题
特征重要方法对缺陷分类器解释的影响
The impact of feature importance methods on the interpretation of defect classifiers
论文作者
论文摘要
先前的研究以从缺陷分类器中得出特征的重要性等级,可以广泛使用分类器特定(CS)和分类器不可思议的(CA)特征重要性方法(通常是互换)。但是,即使对于同一数据集和分类器,不同特征重要的方法也可能会计算不同的功能重要性排名。因此,除非不同方法之间有很强的一致性,否则这种可互换的特征重要方法可以导致结论不稳定性。因此,在本文中,我们通过对18个软件项目和6个常用分类器的案例研究来评估与研究分类器相关的特征重要性等级之间的一致性。我们发现:1)CA和CS方法的计算特征重要性排名并不总是完全彼此同意。 2)所研究的CA方法计算的特征重要性排名表现出强大的一致性,包括在TOP-1和给定数据集和分类器的TOP-1等级中报告的特征,而即使是常用的CS方法也产生了截然不同的特征重要性等级。这些发现引起了人们对在复制研究中结论的稳定性的关注。我们进一步观察到,常用的缺陷数据集与特征相互作用有关,这些特征交互影响CS方法的计算特征重要性等级(不是CA方法)。我们证明,即使使用CFS(例如CFS)的简单方法,可以消除这些特征交互,从而改善了CA和CS方法的计算特征重要性等级之间的一致性。鉴于我们的发现,我们为利益相关者和从业人员提供指导方针时,在执行模型解释和未来研究方向时,需要将来的研究来研究高级功能互动方法对不同CS方法的计算特征重要性等级的影响。
Classifier specific (CS) and classifier agnostic (CA) feature importance methods are widely used (often interchangeably) by prior studies to derive feature importance ranks from a defect classifier. However, different feature importance methods are likely to compute different feature importance ranks even for the same dataset and classifier. Hence such interchangeable use of feature importance methods can lead to conclusion instabilities unless there is a strong agreement among different methods. Therefore, in this paper, we evaluate the agreement between the feature importance ranks associated with the studied classifiers through a case study of 18 software projects and six commonly used classifiers. We find that: 1) The computed feature importance ranks by CA and CS methods do not always strongly agree with each other. 2) The computed feature importance ranks by the studied CA methods exhibit a strong agreement including the features reported at top-1 and top-3 ranks for a given dataset and classifier, while even the commonly used CS methods yield vastly different feature importance ranks. Such findings raise concerns about the stability of conclusions across replicated studies. We further observe that the commonly used defect datasets are rife with feature interactions and these feature interactions impact the computed feature importance ranks of the CS methods (not the CA methods). We demonstrate that removing these feature interactions, even with simple methods like CFS improves agreement between the computed feature importance ranks of CA and CS methods. In light of our findings, we provide guidelines for stakeholders and practitioners when performing model interpretation and directions for future research, e.g., future research is needed to investigate the impact of advanced feature interaction removal methods on computed feature importance ranks of different CS methods.