论文标题
潜在因素回归的预测:自适应PCR及以后
Prediction in latent factor regression: Adaptive PCR and beyond
论文作者
论文摘要
这项工作专门用于对\ Mathbb {r} $的一类线性预测指标的有限样本预测分析,从高维随机矢量$ x \ in \ mathbb {r}^p $ in $(x,x,y)中的$ latents $ latents $ difector y latent $ difector y he \ mathbb {r}^p $ in \ mathbb {r}^p $ in \ mathbb {r}^p $ in \ mathbb {r}^p $ in \ mathbb {r}^p $ nicty $ latents $ difector $ latents $。我们的主要贡献是在因子回归模型下使用无处不在的主组件回归(PCR)方法来建立有限的样本风险范围,并从数据中自适应地选择了主成分的数量,这是一种理论保证的形式,这是PCR文献中缺乏的一种理论保证的形式。为了实现这一目标,我们证明了一个主定理,该定理为大量预测变量建立了风险,包括PCR预测因子作为特殊情况。这种方法的好处是在因子回归设置下提供了一个统一的框架来分析广泛的线性预测方法。特别是,我们使用我们的主要定理来恢复最小值插值预测因子的已知风险范围,该预测因子在过去两年中引起了重新注意,以及针对具有可识别参数的因子回归模型子类量身定制的预测方法。这种模型守则方法可以通过具有潜在中心的簇解释为预测。 为了解决在一组候选预测指标之间选择的问题,我们根据数据分解分析了一个简单的模型选择程序,在因子模型下提供了甲骨文不平等,以证明所选预测变量的性能接近最佳候选人。我们以详细的仿真研究结束,以支持和补充我们的理论结果。
This work is devoted to the finite sample prediction risk analysis of a class of linear predictors of a response $Y\in \mathbb{R}$ from a high-dimensional random vector $X\in \mathbb{R}^p$ when $(X,Y)$ follows a latent factor regression model generated by a unobservable latent vector $Z$ of dimension less than $p$. Our primary contribution is in establishing finite sample risk bounds for prediction with the ubiquitous Principal Component Regression (PCR) method, under the factor regression model, with the number of principal components adaptively selected from the data -- a form of theoretical guarantee that is surprisingly lacking from the PCR literature. To accomplish this, we prove a master theorem that establishes a risk bound for a large class of predictors, including the PCR predictor as a special case. This approach has the benefit of providing a unified framework for the analysis of a wide range of linear prediction methods, under the factor regression setting. In particular, we use our main theorem to recover known risk bounds for the minimum-norm interpolating predictor, which has received renewed attention in the past two years, and a prediction method tailored to a subclass of factor regression models with identifiable parameters. This model-tailored method can be interpreted as prediction via clusters with latent centers. To address the problem of selecting among a set of candidate predictors, we analyze a simple model selection procedure based on data-splitting, providing an oracle inequality under the factor model to prove that the performance of the selected predictor is close to the optimal candidate. We conclude with a detailed simulation study to support and complement our theoretical results.