论文标题

主椭圆形分析(PEA):有效的非线性尺寸降低和聚类

Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension reduction & clustering

论文作者

Paul, Debolina, Chakraborty, Saptarshi, Li, Didong, Dunson, David

论文摘要

即使过度参数化模型的普及程度升高,简单的维度降低和聚类方法(例如PCA和K-均值)仍在各种环境中通常使用。主要原因是简单性,可解释性和计算效率的结合。本文的重点是通过允许数据中的非线性关系和更灵活的群集形状,而无需牺牲关键优势来改善PCA和K-均值。关键的贡献是用于主椭圆分析(PEA)的新框架,它定义了一种简单且在计算上有效的PCA替代方案,该替代品可以通过数据符合最佳的椭圆形近似。我们使用VAPNIK-CHERVONENKIS(VC)理论提供了理论保证,以表现出强烈的一致性和均匀的浓度界限。玩具实验说明了PEA的性能以及适应非线性结构和复杂簇形状的能力。在各种真实的数据聚类应用程序中,PEA与简单数据集的K均值一样,同时显着改善了更复杂的设置的性能。

Even with the rise in popularity of over-parameterized models, simple dimensionality reduction and clustering methods, such as PCA and k-means, are still routinely used in an amazing variety of settings. A primary reason is the combination of simplicity, interpretability and computational efficiency. The focus of this article is on improving upon PCA and k-means, by allowing non-linear relations in the data and more flexible cluster shapes, without sacrificing the key advantages. The key contribution is a new framework for Principal Elliptical Analysis (PEA), defining a simple and computationally efficient alternative to PCA that fits the best elliptical approximation through the data. We provide theoretical guarantees on the proposed PEA algorithm using Vapnik-Chervonenkis (VC) theory to show strong consistency and uniform concentration bounds. Toy experiments illustrate the performance of PEA, and the ability to adapt to non-linear structure and complex cluster shapes. In a rich variety of real data clustering applications, PEA is shown to do as well as k-means for simple datasets, while dramatically improving performance in more complex settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源