爆破了负担泡沫？ Sharma等人的基于反事实的公平度量的评估

论文标题

爆破了负担泡沫？ Sharma等人的基于反事实的公平度量的评估

Bursting the Burden Bubble? An Assessment of Sharma et al.'s Counterfactual-based Fairness Metric

论文作者

van Rosmalen, Yochem, van der Steen, Florian, Jans, Sebastiaan, van der Weijden, Daan

论文摘要

由于偏见，不公平和无法解释的模型，机器学习近年来的负面宣传活动有所增加。对于使机器学习模型对妇女或有色人种等无私社区的更公平的兴趣。需要指标来评估模型的公平性。一种用于评估群体之间公平性的新型指标是负担，它使用反事实来近似群体中否定分类个体的平均距离到模型的决策边界。这项研究的目的是将负担与统计奇偶校验（一种众所周知的公平度量标准）进行比较，并发现负担的优势和缺点。我们通过计算三个数据集中敏感属性的负担和统计奇偶校验来做到这一点：创建两个合成数据集以显示两个指标之间的差异，并使用一个现实世界数据集。我们表明，在统计奇偶校验无法进行的情况下，负担可能表现出不公平，并且两个指标甚至可以不同意哪个群体受到不公平对待。我们得出的结论是，负担是一个有价值的指标，但不能取代统计平价：使用两者都是有价值的。

Machine learning has seen an increase in negative publicity in recent years, due to biased, unfair, and uninterpretable models. There is a rising interest in making machine learning models more fair for unprivileged communities, such as women or people of color. Metrics are needed to evaluate the fairness of a model. A novel metric for evaluating fairness between groups is Burden, which uses counterfactuals to approximate the average distance of negatively classified individuals in a group to the decision boundary of the model. The goal of this study is to compare Burden to statistical parity, a well-known fairness metric, and discover Burden's advantages and disadvantages. We do this by calculating the Burden and statistical parity of a sensitive attribute in three datasets: two synthetic datasets are created to display differences between the two metrics, and one real-world dataset is used. We show that Burden can show unfairness where statistical parity can not, and that the two metrics can even disagree on which group is treated unfairly. We conclude that Burden is a valuable metric, but does not replace statistical parity: it rather is valuable to use both.

下载PDF全文

下载文献需遵守相关版权规定

论文标题