论文标题

缺少值处理机器学习作品集

Missing Values Handling for Machine Learning Portfolios

论文作者

Chen, Andrew Y., McCoy, Jack

论文摘要

我们表征了159个横截面返回预测指标缺失的结构和起源,并研究了使用机器学习构建的投资组合的缺失价值处理。与严格的期望最大化方法相比,只需用横截面手段归纳就表现良好。这源于有关预测数据数据的三个事实:(1)丢失发生在按时间组织的大块中,(2)横截面相关性很小,(3)丢失倾向于在基本数据源组织的块中发生丢失。结果,观察到的数据很少提供有关丢失数据的信息。复杂的归精会引入估计噪声,如果没有仔细应用机器学习,可能会导致表现不佳。

We characterize the structure and origins of missingness for 159 cross-sectional return predictors and study missing value handling for portfolios constructed using machine learning. Simply imputing with cross-sectional means performs well compared to rigorous expectation-maximization methods. This stems from three facts about predictor data: (1) missingness occurs in large blocks organized by time, (2) cross-sectional correlations are small, and (3) missingness tends to occur in blocks organized by the underlying data source. As a result, observed data provide little information about missing data. Sophisticated imputations introduce estimation noise that can lead to underperformance if machine learning is not carefully applied.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源