反事实说明的隐私问题：解释链接攻击

论文标题

反事实说明的隐私问题：解释链接攻击

The privacy issue of counterfactual explanations: explanation linkage attacks

论文作者

Goethals, Sofie, Sörensen, Kenneth, Martens, David

论文摘要

黑盒机器学习模型正在越来越高风险的域中使用，这使人们对可解释的AI（XAI）的需求越来越大。不幸的是，在机器学习中使用XAI会引入新的隐私风险，目前仍然没有注意到。我们介绍了解释链接攻击，在部署基于实例的策略以查找反事实说明时可能会发生。为了应对这种攻击，我们提出了K-匿名的反事实解释，并将纯度作为一种新指标，以评估这些K匿名反事实解释的有效性。我们的结果表明，做出解释而不是整个数据集K-匿名，对解释的质量有益。

Black-box machine learning models are being used in more and more high-stakes domains, which creates a growing need for Explainable AI (XAI). Unfortunately, the use of XAI in machine learning introduces new privacy risks, which currently remain largely unnoticed. We introduce the explanation linkage attack, which can occur when deploying instance-based strategies to find counterfactual explanations. To counter such an attack, we propose k-anonymous counterfactual explanations and introduce pureness as a new metric to evaluate the validity of these k-anonymous counterfactual explanations. Our results show that making the explanations, rather than the whole dataset, k- anonymous, is beneficial for the quality of the explanations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题