论文标题
同质性和亚均匀性追求:迭代补体聚类PCA
Homogeneity and Sub-homogeneity Pursuit: Iterative Complement Clustering PCA
论文作者
论文摘要
主要成分分析(PCA)是最流行的减少维度减少技术,已用于分析许多领域的高维数据。它发现数据中的同质性,并创建一个减少的功能空间,以从原始数据中捕获尽可能多的信息。但是,在存在数据的组结构的情况下,PCA通常无法识别特定于组的模式,这在本研究中被称为亚均匀性。遗漏的特定于组的信息可能会导致特定组的数据表示不令人满意。在高维数据分析中捕获同质性和亚均匀性很重要,但这构成了一个巨大的挑战。在这项研究中,我们提出了一种新型的迭代补体聚类主成分分析(CPCA),以迭代估计均匀性和亚均匀性。还引入了基于主要组件回归的聚类方法,以提供有关簇的可靠信息。从理论上讲,这项研究表明,我们提出的聚类方法可以在某些条件下正确识别群集成员身份。对股票回报数据的仿真研究和实际分析证实了我们提出的方法的出色性能。
Principal component analysis (PCA), the most popular dimension-reduction technique, has been used to analyze high-dimensional data in many areas. It discovers the homogeneity within the data and creates a reduced feature space to capture as much information as possible from the original data. However, in the presence of a group structure of the data, PCA often fails to identify the group-specific pattern, which is known as sub-homogeneity in this study. Group-specific information that is missed can result in an unsatisfactory representation of the data from a particular group. It is important to capture both homogeneity and sub-homogeneity in high-dimensional data analysis, but this poses a great challenge. In this study, we propose a novel iterative complement-clustering principal component analysis (CPCA) to iteratively estimate the homogeneity and sub-homogeneity. A principal component regression based clustering method is also introduced to provide reliable information about clusters. Theoretically, this study shows that our proposed clustering approach can correctly identify the cluster membership under certain conditions. The simulation study and real analysis of the stock return data confirm the superior performance of our proposed methods.