根据观察数据选择最有效的RCT样品的策略

论文标题

根据观察数据选择最有效的RCT样品的策略

Strategy to select most efficient RCT samples based on observational data

论文作者

Shi, Wenqi, Lin, Xi

论文摘要

随机实验可以提供样本平均治疗效果的无偏估计。但是，当实验样本和目标群体不同时，人群治疗效应的估计可能会偏差。在这种情况下，可以通过结合实验和观察数据来确定人口平均治疗效果。一个良好的实验设计胜过随后的所有分析。尽管大多数现有文献都围绕改善RCT后的分析，但我们关注设计阶段，从根本上通过选择实验样本来提高合并因果估计量的效率。我们探讨了RCT样品的协变量分布如何影响估计效率并得出最佳协变量分配，从而导致最低方差。我们的结果表明，最佳分配不一定遵循目标队列的确切分布，而是针对潜在结果的条件变化进行了调整。我们制定了一个度量，以比较并从候选RCT样品组成中进行选择。我们还开发了主要结果的变化，以适应具有各种成本限制和精确要求的实际情况。本文的最终目标是为从业者提供明确且可行的策略，以选择将导致有效的因果推论的RCT样本。

Randomized experiments can provide unbiased estimates of sample average treatment effects. However, estimates of population treatment effects can be biased when the experimental sample and the target population differ. In this case, the population average treatment effect can be identified by combining experimental and observational data. A good experiment design trumps all the analyses that come after. While most of the existing literature centers around improving analyses after RCTs, we instead focus on the design stage, fundamentally improving the efficiency of the combined causal estimator through the selection of experimental samples. We explore how the covariate distribution of RCT samples influences the estimation efficiency and derive the optimal covariate allocation that leads to the lowest variance. Our results show that the optimal allocation does not necessarily follow the exact distribution of the target cohort, but adjusted for the conditional variability of potential outcomes. We formulate a metric to compare and choose from candidate RCT sample compositions. We also develop variations of our main results to cater for practical scenarios with various cost constraints and precision requirements. The ultimate goal of this paper is to provide practitioners with a clear and actionable strategy to select RCT samples that will lead to efficient causal inference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题