论文标题
使用信息理论量化对整体差异的特征贡献
Quantifying Feature Contributions to Overall Disparity Using Information Theory
论文作者
论文摘要
当机器学习算法做出有偏见的决定时,了解差异来源以解释为什么存在偏见会很有帮助。在此方面,我们研究了量化每个单独特征对观察到的差异的贡献的问题。如果我们可以访问决策模型,则一种潜在的方法(从解释性文献中的基于干预的方法启发)是改变每个单独的功能(同时保持其他功能),并使用结果差异的变化来量化其贡献。但是,我们可能无法访问该模型,也无法测试/审核其输出以单独变化的功能。此外,该决定可能并不总是是输入特征(例如,在循环中)的确定性函数。对于这些情况,我们可能需要使用纯粹的分布(即观察)技术来解释贡献,而不是介入。我们提出一个问题:当确切的决策机制无法访问时,每个单独特征对在决策中观察到的差异的“潜在”贡献是什么?我们首先提供规范的示例(思想实验),以说明解释贡献的分布和介入方法之间的差异,以及任何一种更适合的贡献。当无法干预输入时,我们通过利用一种称为部分信息分解的信息理论中的作品来量化有关最终决策和单个特征中存在的受保护属性的“冗余”统计依赖性。我们还进行了一个简单的案例研究,以说明如何应用该技术来量化贡献。
When a machine-learning algorithm makes biased decisions, it can be helpful to understand the sources of disparity to explain why the bias exists. Towards this, we examine the problem of quantifying the contribution of each individual feature to the observed disparity. If we have access to the decision-making model, one potential approach (inspired from intervention-based approaches in explainability literature) is to vary each individual feature (while keeping the others fixed) and use the resulting change in disparity to quantify its contribution. However, we may not have access to the model or be able to test/audit its outputs for individually varying features. Furthermore, the decision may not always be a deterministic function of the input features (e.g., with human-in-the-loop). For these situations, we might need to explain contributions using purely distributional (i.e., observational) techniques, rather than interventional. We ask the question: what is the "potential" contribution of each individual feature to the observed disparity in the decisions when the exact decision-making mechanism is not accessible? We first provide canonical examples (thought experiments) that help illustrate the difference between distributional and interventional approaches to explaining contributions, and when either is better suited. When unable to intervene on the inputs, we quantify the "redundant" statistical dependency about the protected attribute that is present in both the final decision and an individual feature, by leveraging a body of work in information theory called Partial Information Decomposition. We also perform a simple case study to show how this technique could be applied to quantify contributions.