论文标题

用主成分回归解决小鼠中的“许多变量”问题

Solving the "many variables" problem in MICE with principal component regression

论文作者

Costantini, Edoardo, Lang, Kyle M., Sijtsma, Klaas, Reeskens, Tim

论文摘要

多个插补(MI)是解决问卷和调查中缺失值的最流行方法之一。由链式方程(小鼠)插入多元插补的MI可以灵活地插入许多类型的数据。在小鼠中,对于插入下的每个变量,渗透器需要指定哪些变量应在插补模型中充当预测因子。这些预测因子的选择是MI过程中的困难但基本的一步,尤其是当数据集中有许多变量时。在该项目中,我们探讨了主成分回归(PCR)作为小鼠算法中单变量插补方法的使用,以自动解决大型社会科学数据时出现的“许多变量”问题。我们通过蒙特卡洛模拟研究和案例研究比较了基于PCR的小鼠的不同实现与相关阈值策略。我们发现在可变的基础上使用PCR可以表现最佳,并且可以与专业设计的插补程序紧密相关。

Multiple Imputation (MI) is one of the most popular approaches to addressing missing values in questionnaires and surveys. MI with multivariate imputation by chained equations (MICE) allows flexible imputation of many types of data. In MICE, for each variable under imputation, the imputer needs to specify which variables should act as predictors in the imputation model. The selection of these predictors is a difficult, but fundamental, step in the MI procedure, especially when there are many variables in a data set. In this project, we explore the use of principal component regression (PCR) as a univariate imputation method in the MICE algorithm to automatically address the "many variables" problem that arises when imputing large social science data. We compare different implementations of PCR-based MICE with a correlation-thresholding strategy by means of a Monte Carlo simulation study and a case study. We find the use of PCR on a variable-by-variable basis to perform best and that it can perform closely to expertly designed imputation procedures.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源