论文标题

关于Euclidean $(K,Z)$的最佳核心结构

On Optimal Coreset Construction for Euclidean $(k,z)$-Clustering

论文作者

Huang, Lingxiao, Li, Jian, Wu, Xuan

论文摘要

在过去的十年中,在不同度量空间中构建小型核心用于各种聚类问题。核心文献中的一个核心问题是了解$(k,z)$ - 聚类在欧几里得空间中的最佳核心大小。尽管该问题取得了重大进展,但最新的上限和下限之间仍然存在差距。例如,以$ k $ -means($ z = 2 $)为$ \ min \ {o(k^{3/2} \ varepsilon^{ - 2}),o(k \ varepsilon^{ - 4} { - 4})\ \ var vary($ var)($ var)(k)(k)(k)(2), [1]。 在本文中,我们在上限和下限都取得了重大进展。对于各种参数(即$ \ varepsilon,k $),我们对最佳内核大小有了完全的了解。特别是,我们获得以下结果: (1)我们提出了一个新的核心下限$ω(k \ varepsilon^{ - z-2})$ for Euclidean $(k,z)$ - $ \ varepsilon \ varepsilon \ geqω(k^{-1/(z+2)})$当时聚类。鉴于先前的上限$ \ tilde {o} _z(k \ varepsilon^{ - z-2})$ [1],界限是最佳的。新的下边界还意味着$(k,z)$的下限改进了两倍的指标。 (2)对于上限,我们为$(k,z)$的有效核心结构算法 - 聚类,在几个公制空间中进行改进或最佳核心尺寸。特别是,我们提供了一个$ \ tilde {o} _z(k^{\ frac {\ frac {2z+2} {z+2}} \ varepsilon^{ - 2})$ - 大小的核心,并带有不层面的分析,用于$(k,z)$ - clustering $ z z)$ z \ geq \ geq 1 $ $ euce n euclidean。 [1] Cohen-Addad,Larsen,Saulpic,Schwiegelshohn。 stoc'22。 [2] Cohen-Addad,Larsen,Saulpic,Schwiegelshohn,Sheikh-Omar,Neurips'22。

Constructing small-sized coresets for various clustering problems in different metric spaces has attracted significant attention for the past decade. A central problem in the coreset literature is to understand what is the best possible coreset size for $(k,z)$-clustering in Euclidean space. While there has been significant progress in the problem, there is still a gap between the state-of-the-art upper and lower bounds. For instance, the best known upper bound for $k$-means ($z=2$) is $\min \{O(k^{3/2} \varepsilon^{-2}),O(k \varepsilon^{-4})\}$ [1,2], while the best known lower bound is $Ω(k\varepsilon^{-2})$ [1]. In this paper, we make significant progress on both upper and lower bounds. For a large range of parameters (i.e., $\varepsilon, k$), we have a complete understanding of the optimal coreset size. In particular, we obtain the following results: (1) We present a new coreset lower bound $Ω(k \varepsilon^{-z-2})$ for Euclidean $(k,z)$-clustering when $\varepsilon \geq Ω(k^{-1/(z+2)})$. In view of the prior upper bound $\tilde{O}_z(k \varepsilon^{-z-2})$ [1], the bound is optimal. The new lower bound also implies improved lower bounds for $(k,z)$-clustering in doubling metrics. (2) For the upper bound, we provide efficient coreset construction algorithms for $(k,z)$-clustering with improved or optimal coreset sizes in several metric spaces. In particular, we provide an $\tilde{O}_z(k^{\frac{2z+2}{z+2}} \varepsilon^{-2})$-sized coreset, with a unfied analysis, for $(k,z)$-clustering for all $z\geq 1$ in Euclidean space. [1] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22. [2] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源