论文标题

逻辑回归模型的多重插奖:合并交互

Multiple imputation for logistic regression models: incorporating an interaction

论文作者

Smith, Matthew J., Quartagno, Matteo, Njagi, Edmund Njeru

论文摘要

背景:当数据丢失时,通常使用多个插补来降低偏差和提高效率。最合适的插补方法取决于分析师对安装感兴趣的模型。已经提出了几种插补方法,当时该模型是一个具有相互作用项的逻辑回归模型,该模型包含二元部分观察到的变量。但是,尚不清楚哪些参数设置在哪些方面表现最佳。方法:使用1000个模拟,每个模拟在六个数据生成机制(DGM)下进行10,000个观测值,我们研究了四种方法的性能:(i)“被动插补”,(ii)“只是另一个变量”(JAV),(iii)'strateify-impute-impute-imp append-append'(sia)(sia)和(iv)'(iv)'(iv)''(IV)''实质性符合条件的特定条件。使用基于英格兰的癌症注册表数据的经验示例中显示了每种方法的应用。结果:SMCFC和SIA显示出对完全,部分,观察到的变量和相互作用项的系数估计值最少。对于所有DGM,SMCFC和SIA显示出良好的覆盖率和低相对误差。当相互作用中完全观察到的变量患病率较低时,SMCFC的偏见很大。当相互作用中完全观察到的变量具有连续的基础形式时,SIA的表现较差。结论:当数据随机丢失时,SMCFC和SIA对具有相互作用项的逻辑回归模型进行了一致的估计,并且可以在大多数分析中使用。当相互作用中完全观察到的变量具有潜在的连续形式时,SMCFC的性能优于SIA。当相互作用中完全观察到的变量患病率较低时,研究人员在使用SMCFC时应谨慎。

Background: Multiple imputation is often used to reduce bias and gain efficiency when there is missing data. The most appropriate imputation method depends on the model the analyst is interested in fitting. Several imputation approaches have been proposed for when this model is a logistic regression model with an interaction term that contains a binary partially observed variable; however, it is not clear which performs best under certain parameter settings. Methods: Using 1000 simulations, each with 10,000 observations, under six data-generating mechanisms (DGM), we investigate the performance of four methods: (i) 'passive imputation', (ii) 'just another variable' (JAV), (iii) 'stratify-impute-append' (SIA), and (iv) 'substantive model compatible fully conditional specifica-tion' (SMCFCS). The application of each method is shown in an empirical example using England-based cancer registry data. Results: SMCFCS and SIA showed the least biased estimate of the coefficients for the fully, and partially, observed variable and the interaction term. SMCFCS and SIA showed good coverage and low relative error for all DGMs. SMCFCS had a large bias when there was a low prevalence of the fully observed variable in the interaction. SIA performed poorly when the fully observed variable in the interaction had a continuous underlying form. Conclusion: SMCFCS and SIA give consistent estimation for logistic regression models with an interaction term when data are missing at random, and either can be used in most analyses. SMCFCS performed better than SIA when the fully observed variable in the interaction had an underlying continuous form. Researchers should be cautious when using SMCFCS when there is a low prevalence of the fully observed variable in the interaction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源