论文标题

比率切割多层和K均值聚类

The ratio-cut polytope and K-means clustering

论文作者

De Rosa, Antonio, Khajavirad, Aida

论文摘要

我们介绍了将比率切割多层定义为比率切割向量的凸壳,该载体对应于$ \ Mathbb r^m $中的所有分区,最多是$ k $ clusters。该多层与许多聚类问题(例如K-均值聚类和光谱聚类)的可行区域的凸壳密切相关。我们研究了比率切割多层的面部结构,并得出了几种类型的方面定义不平等。然后,我们考虑K-均值聚类的问题,并为其引入一种新颖的线性编程(LP)放松。随后,我们专注于两个簇的情况,并得出了足够的条件,在该情况下,提出的LP松弛精确地恢复了基础簇。也就是说,我们考虑了随机球模型,这是一种流行的K-均值聚类的生成模型,并且我们表明,如果聚类中心之间的分离距离满足$Δ> 1+ \ sqrt 3 $,则LP松弛恢复了种植的簇,具有很高的可能性。这是对K均值聚类的LP松弛的唯一现有恢复保证的重大改进,表明当且仅当$δ> 4 $时,概率可能很高。我们的数值实验表明,所提出的LP松弛显着优于流行的半决赛编程松弛,可以恢复种植的簇。

We introduce the ratio-cut polytope defined as the convex hull of ratio-cut vectors corresponding to all partitions of $n$ points in $\mathbb R^m$ into at most $K$ clusters. This polytope is closely related to the convex hull of the feasible region of a number of clustering problems such as K-means clustering and spectral clustering. We study the facial structure of the ratio-cut polytope and derive several types of facet-defining inequalities. We then consider the problem of K-means clustering and introduce a novel linear programming (LP) relaxation for it. Subsequently, we focus on the case of two clusters and derive sufficient conditions under which the proposed LP relaxation recovers the underlying clusters exactly. Namely, we consider the stochastic ball model, a popular generative model for K-means clustering, and we show that if the separation distance between cluster centers satisfies $Δ> 1+\sqrt 3$, then the LP relaxation recovers the planted clusters with high probability. This is a major improvement over the only existing recovery guarantee for an LP relaxation of K-means clustering stating that recovery is possible with high probability if and only if $Δ> 4$. Our numerical experiments indicate that the proposed LP relaxation significantly outperforms a popular semidefinite programming relaxation in recovering the planted clusters.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源