缺少值处理机器学习作品集

论文标题

缺少值处理机器学习作品集

Missing Values Handling for Machine Learning Portfolios

论文作者

Chen, Andrew Y., McCoy, Jack

论文摘要

我们表征了159个横截面返回预测指标缺失的结构和起源，并研究了使用机器学习构建的投资组合的缺失价值处理。与严格的期望最大化方法相比，只需用横截面手段归纳就表现良好。这源于有关预测数据数据的三个事实：（1）丢失发生在按时间组织的大块中，（2）横截面相关性很小，（3）丢失倾向于在基本数据源组织的块中发生丢失。结果，观察到的数据很少提供有关丢失数据的信息。复杂的归精会引入估计噪声，如果没有仔细应用机器学习，可能会导致表现不佳。

We characterize the structure and origins of missingness for 159 cross-sectional return predictors and study missing value handling for portfolios constructed using machine learning. Simply imputing with cross-sectional means performs well compared to rigorous expectation-maximization methods. This stems from three facts about predictor data: (1) missingness occurs in large blocks organized by time, (2) cross-sectional correlations are small, and (3) missingness tends to occur in blocks organized by the underlying data source. As a result, observed data provide little information about missing data. Sophisticated imputations introduce estimation noise that can lead to underperformance if machine learning is not carefully applied.

下载PDF全文

下载文献需遵守相关版权规定

论文标题