论文标题
公平标记的聚类
Fair Labeled Clustering
论文作者
论文摘要
在许多不同的公平概念下聚类的基本问题已经产生了许多算法。当前最常见的概念家族可能是群体公平,在每个集群中都确保了成比例的群体代表。我们通过考虑聚类的下游应用以及如何确保在这种设置中确保组公平性来扩展此方向。具体而言,我们考虑了一个共同的环境,其中决策者运行聚类算法,检查每个集群的中心,并为其相应的群集确定适当的结果(标签)。例如,在招聘中,可能有两个结果,即阳性(雇用)或负面(拒绝),每个集群将被分配这两个结果之一。为了确保在这种情况下的团体公平性,我们希望每个标签中的比例组表示,但不一定像小组公平聚类中所做的那样。我们为此类问题提供了算法,并表明与他们在集体公平聚类中的NP-HARD对应物相比,它们允许有效的解决方案。我们还考虑了一个动机良好的替代设置,而决策者可以免费为集群分配标签,而不管中心在公制空间中的位置如何。我们表明,根据问题的其他限制,该设置从计算上难以轻松到容易的过渡表现出有趣的过渡。此外,当约束参数呈现自然值时,我们显示了这种设置的随机算法,该算法始终可以实现最佳的聚类并满足预期的公平约束。最后,我们在现实世界数据集上进行实验,以验证算法的有效性。
Numerous algorithms have been produced for the fundamental problem of clustering under many different notions of fairness. Perhaps the most common family of notions currently studied is group fairness, in which proportional group representation is ensured in every cluster. We extend this direction by considering the downstream application of clustering and how group fairness should be ensured for such a setting. Specifically, we consider a common setting in which a decision-maker runs a clustering algorithm, inspects the center of each cluster, and decides an appropriate outcome (label) for its corresponding cluster. In hiring for example, there could be two outcomes, positive (hire) or negative (reject), and each cluster would be assigned one of these two outcomes. To ensure group fairness in such a setting, we would desire proportional group representation in every label but not necessarily in every cluster as is done in group fair clustering. We provide algorithms for such problems and show that in contrast to their NP-hard counterparts in group fair clustering, they permit efficient solutions. We also consider a well-motivated alternative setting where the decision-maker is free to assign labels to the clusters regardless of the centers' positions in the metric space. We show that this setting exhibits interesting transitions from computationally hard to easy according to additional constraints on the problem. Moreover, when the constraint parameters take on natural values we show a randomized algorithm for this setting that always achieves an optimal clustering and satisfies the fairness constraints in expectation. Finally, we run experiments on real world datasets that validate the effectiveness of our algorithms.