论文标题
缺少值处理机器学习作品集
Missing Values Handling for Machine Learning Portfolios
论文作者
论文摘要
我们表征了159个横截面返回预测指标缺失的结构和起源,并研究了使用机器学习构建的投资组合的缺失价值处理。与严格的期望最大化方法相比,只需用横截面手段归纳就表现良好。这源于有关预测数据数据的三个事实:(1)丢失发生在按时间组织的大块中,(2)横截面相关性很小,(3)丢失倾向于在基本数据源组织的块中发生丢失。结果,观察到的数据很少提供有关丢失数据的信息。复杂的归精会引入估计噪声,如果没有仔细应用机器学习,可能会导致表现不佳。
We characterize the structure and origins of missingness for 159 cross-sectional return predictors and study missing value handling for portfolios constructed using machine learning. Simply imputing with cross-sectional means performs well compared to rigorous expectation-maximization methods. This stems from three facts about predictor data: (1) missingness occurs in large blocks organized by time, (2) cross-sectional correlations are small, and (3) missingness tends to occur in blocks organized by the underlying data source. As a result, observed data provide little information about missing data. Sophisticated imputations introduce estimation noise that can lead to underperformance if machine learning is not carefully applied.