论文标题
通过最佳翻转攻击打破公平的二进制分类
Breaking Fair Binary Classification with Optimal Flipping Attacks
论文作者
论文摘要
通过公平限制,最大程度地降低风险是学习公平分类器的流行方法之一。最近的作品表明,如果训练集损坏,这种方法会产生不公平的分类器。在这项工作中,我们研究了成功翻转攻击所需的最小数据损坏量。首先,我们发现该数量上的下限/上限,并表明当目标模型是独特的无约束风险最小化器时,这些边界很紧。其次,我们提出了一种计算有效的数据中毒攻击算法,该算法可能会损害公平学习算法的性能。
Minimizing risk with fairness constraints is one of the popular approaches to learning a fair classifier. Recent works showed that this approach yields an unfair classifier if the training set is corrupted. In this work, we study the minimum amount of data corruption required for a successful flipping attack. First, we find lower/upper bounds on this quantity and show that these bounds are tight when the target model is the unique unconstrained risk minimizer. Second, we propose a computationally efficient data poisoning attack algorithm that can compromise the performance of fair learning algorithms.