论文标题
自动稀疏PCA用于高维数据
Automatic sparse PCA for high-dimensional data
论文作者
论文摘要
稀疏主成分分析(SPCA)方法已被证明可以有效地分析高维数据。其中,基于阈值的SPCA(TSPCA)在计算上比基于L1罚款的正规化SPCA更具成本效益。我们在本文中介绍了TSPCA对高维数据设置的疗效的研究,并说明,对于合适的阈值,TSPCA可以实现高维数据的令人满意的性能。因此,TSPCA的性能在很大程度上取决于选定的阈值。为此,我们提出了一个新颖的阈值估计器,以使用自定义的减少方法来获取主组件(PC)方向。所提出的技术在温和条件下是一致的,不受阈值的影响,因此以较低的计算成本迅速产生更准确的结果。此外,我们探索了收缩PC方向及其在聚类高维数据中的应用。最后,我们在实际数据分析中评估了估计的收缩PC方向的性能。
Sparse principal component analysis (SPCA) methods have proven to efficiently analyze high-dimensional data. Among them, threshold-based SPCA (TSPCA) is computationally more cost-effective than regularized SPCA, based on L1 penalties. We herein present an investigation of the efficacy of TSPCA for high-dimensional data settings and illustrate that, for a suitable threshold value, TSPCA achieves satisfactory performance for high-dimensional data. Thus, the performance of the TSPCA depends heavily on the selected threshold value. To this end, we propose a novel thresholding estimator to obtain the principal component (PC) directions using a customized noise-reduction methodology. The proposed technique is consistent under mild conditions, unaffected by threshold values, and therefore yields more accurate results quickly at a lower computational cost. Furthermore, we explore the shrinkage PC directions and their application in clustering high-dimensional data. Finally, we evaluate the performance of the estimated shrinkage PC directions in actual data analyses.