Dikaios：通过属性推理攻击对算法公平性的隐私审核

论文标题

Dikaios：通过属性推理攻击对算法公平性的隐私审核

Dikaios: Privacy Auditing of Algorithmic Fairness via Attribute Inference Attacks

论文作者

Aalmoes, Jan, Duddu, Vasisht, Boutet, Antoine

论文摘要

机器学习（ML）模型已用于高风险应用程序。由于在数据集中观察到的敏感属性中的类不平衡，ML模型对通过敏感属性（例如种族和性别）确定的少数族裔亚组不公平。处理公平算法确保模型预测独立于敏感属性。此外，ML模型容易受到属性推理攻击的影响，在这种推理攻击中，对手可以通过利用其可区分的模型预测来识别敏感属性的值。尽管隐私和公平是值得信赖的ML的重要支柱，但尚未研究由公平算法在属性泄漏方面引入的隐私风险。我们将属性推理攻击确定为审核黑框公平算法的有效度量，以使模型构建器能够考虑模型设计中的隐私和公平性。我们提出了Dikaios，这是一种用于模型构建者公平算法的隐私审核工具，该工具利用了一种新的有效属性推理攻击，该攻击通过自适应预测阈值来解释敏感属性中的类不平衡。我们评估了Dikaios，以对五个数据集的两种核对公平算法进行隐私审核。我们表明，具有自适应预测阈值的属性推理攻击明显优于先前的攻击。我们强调了对处理公平算法的局限性，以确保不同敏感属性值之间无法区分的预测。实际上，根据数据集中敏感属性的比例，这些内部处理公平方案的属性隐私风险是高度可变的。公平机制对属性隐私风险的这种不可预测的影响是对其利用的重要限制，该限制必须由模型构建器解释。

Machine learning (ML) models have been deployed for high-stakes applications. Due to class imbalance in the sensitive attribute observed in the datasets, ML models are unfair on minority subgroups identified by a sensitive attribute, such as race and sex. In-processing fairness algorithms ensure model predictions are independent of sensitive attribute. Furthermore, ML models are vulnerable to attribute inference attacks where an adversary can identify the values of sensitive attribute by exploiting their distinguishable model predictions. Despite privacy and fairness being important pillars of trustworthy ML, the privacy risk introduced by fairness algorithms with respect to attribute leakage has not been studied. We identify attribute inference attacks as an effective measure for auditing blackbox fairness algorithms to enable model builder to account for privacy and fairness in the model design. We proposed Dikaios, a privacy auditing tool for fairness algorithms for model builders which leveraged a new effective attribute inference attack that account for the class imbalance in sensitive attributes through an adaptive prediction threshold. We evaluated Dikaios to perform a privacy audit of two in-processing fairness algorithms over five datasets. We show that our attribute inference attacks with adaptive prediction threshold significantly outperform prior attacks. We highlighted the limitations of in-processing fairness algorithms to ensure indistinguishable predictions across different values of sensitive attributes. Indeed, the attribute privacy risk of these in-processing fairness schemes is highly variable according to the proportion of the sensitive attributes in the dataset. This unpredictable effect of fairness mechanisms on the attribute privacy risk is an important limitation on their utilization which has to be accounted by the model builder.

下载PDF全文

下载文献需遵守相关版权规定

论文标题