概率公平的统计检验

论文标题

概率公平的统计检验

A Statistical Test for Probabilistic Fairness

论文作者

Taskesen, Bahar, Blanchet, Jose, Kuhn, Daniel, Nguyen, Viet Anh

论文摘要

现在，通常将算法用于做出影响人类生活的结果决定。例子包括大学入学，医疗干预或执法。尽管算法使我们能够利用所有隐藏在大量数据中的信息，但它们可能会无意中扩大可用数据集中的现有偏见。这种担忧引发了人们对公平机器学习的兴趣，旨在量化和减轻算法歧视。实际上，机器学习模型应进行密集测试，以在大规模部署之前检测算法偏见。在本文中，我们使用最佳运输理论中的思想提出了一个统计假设检验，以检测不公平的分类器。利用特征空间的几何形状，测试统计量量化了在测试样品上支持的经验分布的距离与使预训练的分类器公平的分布的歧管。我们开发了一种严格的假设检验机制，用于评估任何预训练的逻辑分类器的概率公平性，并且在理论上和经验上都表明该提议的测试在渐近上是渐进的。此外，提出的框架通过识别数据的最有利的扰动来提供解释性，以使给定的分类器变得公平。

Algorithms are now routinely used to make consequential decisions that affect human lives. Examples include college admissions, medical interventions or law enforcement. While algorithms empower us to harness all information hidden in vast amounts of data, they may inadvertently amplify existing biases in the available datasets. This concern has sparked increasing interest in fair machine learning, which aims to quantify and mitigate algorithmic discrimination. Indeed, machine learning models should undergo intensive tests to detect algorithmic biases before being deployed at scale. In this paper, we use ideas from the theory of optimal transport to propose a statistical hypothesis test for detecting unfair classifiers. Leveraging the geometry of the feature space, the test statistic quantifies the distance of the empirical distribution supported on the test samples to the manifold of distributions that render a pre-trained classifier fair. We develop a rigorous hypothesis testing mechanism for assessing the probabilistic fairness of any pre-trained logistic classifier, and we show both theoretically as well as empirically that the proposed test is asymptotically correct. In addition, the proposed framework offers interpretability by identifying the most favorable perturbation of the data so that the given classifier becomes fair.

下载PDF全文

下载文献需遵守相关版权规定

论文标题