论文标题

探索Rawlsian公平的K均值集群

Exploring Rawlsian Fairness for K-Means Clustering

论文作者

Simoes, Stanley, P, Deepak, MacCarthaigh, Muiris

论文摘要

我们进行了一项探索性研究,着眼于将约翰·罗尔斯(John Rawls)关于公平性的想法纳入现有的无监督的机器学习算法。我们的重点是聚类的任务,特别是K-均值聚类算法。据我们所知,这是第一部在聚类中使用Rawlsian想法的作品。为此,我们尝试开发一种后处理技术,即,该技术是在标准K-均值群集算法生成的群集分配上运行的技术。我们的技术在许多迭代中都涉及这项任务,以使其根据Rawls的差异原则更公平,同时最小化影响整体效用。作为第一步,我们考虑两个简单的扰动操作员 - $ \ mathbf {r_1} $和$ \ mathbf {r_2} $ - 在给定群集分配给新簇中的示例; $ \ mathbf {r_1} $将单个示例分配给一个新群集,而$ \ mathbf {r_2} $一对示例为新簇。我们对成人数据集样本的实验表明,两个操作员在集群分配中都具有有意义的扰动,以纳入Rawls的差异原理,而$ \ MathBf {R_2} $在迭代次数上都比$ \ Mathbf {R_1} $更有效。但是,我们观察到仍然需要设计可使扰动更好的操作员。然而,两个运营商都为设计和比较任何未来的运营商提供了良好的基准,我们希望我们的发现将有助于朝这个方向朝着这个方向发展。

We conduct an exploratory study that looks at incorporating John Rawls' ideas on fairness into existing unsupervised machine learning algorithms. Our focus is on the task of clustering, specifically the k-means clustering algorithm. To the best of our knowledge, this is the first work that uses Rawlsian ideas in clustering. Towards this, we attempt to develop a postprocessing technique i.e., one that operates on the cluster assignment generated by the standard k-means clustering algorithm. Our technique perturbs this assignment over a number of iterations to make it fairer according to Rawls' difference principle while minimally affecting the overall utility. As the first step, we consider two simple perturbation operators -- $\mathbf{R_1}$ and $\mathbf{R_2}$ -- that reassign examples in a given cluster assignment to new clusters; $\mathbf{R_1}$ assigning a single example to a new cluster, and $\mathbf{R_2}$ a pair of examples to new clusters. Our experiments on a sample of the Adult dataset demonstrate that both operators make meaningful perturbations in the cluster assignment towards incorporating Rawls' difference principle, with $\mathbf{R_2}$ being more efficient than $\mathbf{R_1}$ in terms of the number of iterations. However, we observe that there is still a need to design operators that make significantly better perturbations. Nevertheless, both operators provide good baselines for designing and comparing any future operator, and we hope our findings would aid future work in this direction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源