在模型识别和主要组件回归的样本外预测：合成控制的应用

论文标题

在模型识别和主要组件回归的样本外预测：合成控制的应用

On Model Identification and Out-of-Sample Prediction of Principal Component Regression: Applications to Synthetic Controls

论文作者

Agarwal, Anish, Shah, Devavrat, Shen, Dennis

论文摘要

我们在具有固定设计的高维错误设置中分析了主成分回归（PCR）。在适当的条件下，我们表明PCR始终用最小$ \ ell_2 $ -norm识别唯一模型。这些结果使我们能够建立非征服的样本外预测，可以确保提高最著名的速率。在分析过程中，我们在样本外和范围的协变量之间引入了天然的线性代数条件，这使我们能够避免针对样本外预测的分布假设。我们的模拟说明了即使在协变量的转变下，这种条件对于概括的重要性。因此，我们构建了一个假设检验，以检查何时在实践中保持这种情况。作为副产品，我们的结果还为合成控制文献带来了新的结果，这是政策评估的主要方法。据我们所知，在固定设计设置的预测中，在高维错误和合成控制文献中都难以捉摸。

We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. Under suitable conditions, we show that PCR consistently identifies the unique model with minimum $\ell_2$-norm. These results enable us to establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates. In the course of our analysis, we introduce a natural linear algebraic condition between the in- and out-of-sample covariates, which allows us to avoid distributional assumptions for out-of-sample predictions. Our simulations illustrate the importance of this condition for generalization, even under covariate shifts. Accordingly, we construct a hypothesis test to check when this conditions holds in practice. As a byproduct, our results also lead to novel results for the synthetic controls literature, a leading approach for policy evaluation. To the best of our knowledge, our prediction guarantees for the fixed design setting have been elusive in both the high-dimensional error-in-variables and synthetic controls literatures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题