论文标题

强大的自我调整半参数PCA用于污染椭圆形分布

Robust self-tuning semiparametric PCA for contaminated elliptical distribution

论文作者

Hung, Hung, Huang, Su-Yun, Eguchi, Shinto

论文摘要

主成分分析(PCA)是最流行的降低方法之一。众所周知,通常的PCA对异常值的存在很敏感,因此已经开发出许多健壮的PCA方法。其中,泰勒的M-估计剂被证明是椭圆形分布下最强大的散射估计量。但是,当基础分布受到污染并偏离椭圆度时,泰勒的M-估计器可能无法正常工作。在本文中,我们将半参数理论应用于提出强大的半参数PCA。我们提案的优点是双重的。首先,它对重尾椭圆形分布以及对非胸花异常值的鲁棒性是可靠的。其次,它与数据驱动的调谐过程配对,该过程基于活动比率,并且可以适应不同程度的数据外倾性。得出理论特性,包括各种统计功能和渐近正态性的影响函数。模拟研究和数据分析证明了我们方法的优越性。

Principal component analysis (PCA) is one of the most popular dimension reduction methods. The usual PCA is known to be sensitive to the presence of outliers, and thus many robust PCA methods have been developed. Among them, the Tyler's M-estimator is shown to be the most robust scatter estimator under the elliptical distribution. However, when the underlying distribution is contaminated and deviates from ellipticity, Tyler's M-estimator might not work well. In this article, we apply the semiparametric theory to propose a robust semiparametric PCA. The merits of our proposal are twofold. First, it is robust to heavy-tailed elliptical distributions as well as robust to non-elliptical outliers. Second, it pairs well with a data-driven tuning procedure, which is based on active ratio and can adapt to different degrees of data outlyingness. Theoretical properties are derived, including the influence functions for various statistical functionals and asymptotic normality. Simulation studies and a data analysis demonstrate the superiority of our method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源