在基于电子健康记录的比较有效性研究中丢失信息丢失数据的双重采样

论文标题

在基于电子健康记录的比较有效性研究中丢失信息丢失数据的双重采样

Double sampling for informatively missing data in electronic health record-based comparative effectiveness research

论文作者

Levis, Alexander W., Mukherjee, Rajarshi, Wang, Rui, Fischer, Heidi, Haneuse, Sebastien

论文摘要

在大多数应用的设置中都会出现缺少数据，并且在电子健康记录（EHR）中无处不在。当数据不随机（MNAR）相对于测得的协变量时，通常会考虑灵敏度分析。但是，这些事后解决方案通常不满意，因为它们不能保证得出具体的结论。在减肥手术后基于EHR的长期结局的研究中，我们考虑使用双重抽样作为减轻MNAR结果数据的手段，当时统计目标是估计和有关因果关系的推断。我们描述了足以识别该设计中混杂因素，治疗和结果的联合分布的假设。此外，我们在非参数模型下得出了平均因果治疗效应的有效且可靠的估计量，并且在模型下，假设结果实际上是最初在随机（MAR）中缺失的。我们将这些在模拟中比较了这些方法，即基于违反MAR假设的证据来适应估计的方法。最后，我们还表明，可以扩展所提出的双重抽样设计以处理任意粗化机制，并得出任何平滑完整数据功能的非参数有效估计器。

Missing data arise in most applied settings and are ubiquitous in electronic health records (EHR). When data are missing not at random (MNAR) with respect to measured covariates, sensitivity analyses are often considered. These post-hoc solutions, however, are often unsatisfying in that they are not guaranteed to yield concrete conclusions. Motivated by an EHR-based study of long-term outcomes following bariatric surgery, we consider the use of double sampling as a means to mitigate MNAR outcome data when the statistical goals are estimation and inference regarding causal effects. We describe assumptions that are sufficient for the identification of the joint distribution of confounders, treatment, and outcome under this design. Additionally, we derive efficient and robust estimators of the average causal treatment effect under a nonparametric model and under a model assuming outcomes were, in fact, initially missing at random (MAR). We compare these in simulations to an approach that adaptively estimates based on evidence of violation of the MAR assumption. Finally, we also show that the proposed double sampling design can be extended to handle arbitrary coarsening mechanisms, and derive nonparametric efficient estimators of any smooth full data functional.

下载PDF全文

下载文献需遵守相关版权规定

论文标题