鲁棒Q学习中有效的后选择推断

论文标题

鲁棒Q学习中有效的后选择推断

Valid post-selection inference in Robust Q-learning

论文作者

Jones, Jeremiah, Ertefaie, Ashkan, Strawderman, Robert L.

论文摘要

当有大量潜在的剪裁变量时，构建最佳的自适应治疗策略就会变得复杂。在这种情况下，这些无关的变量中的许多可能对适应性策略几乎没有贡献，同时增加实施成本并给患者带来不适当的负担。尽管现有的方法允许选择信息性的预后因素，但数据驱动的选择过程使统计推断变得复杂。为了解决这种缺陷，我们将通用后选择推理程序调整为半参数鲁棒Q学习方法以及此类多阶段决策方法中遇到的独特挑战。在此过程中，我们还确定了在此后推理框架中构建的置信区间的统一改进。在某些速率假设下，我们提供了理论结果，以证明置信区和根据我们提议的程序构建的测试的有效性。通过模拟研究将我们方法的性能与选择性推理框架进行了比较，证明了我们的程序的优势及其对多种选择机制的适用性。

Constructing an optimal adaptive treatment strategy becomes complex when there are a large number of potential tailoring variables. In such scenarios, many of these extraneous variables may contribute little or no benefit to an adaptive strategy while increasing implementation costs and putting an undue burden on patients. Although existing methods allow selection of the informative prognostic factors, statistical inference is complicated by the data-driven selection process. To remedy this deficiency, we adapt the Universal Post-Selection Inference procedure to the semiparametric Robust Q-learning method and the unique challenges encountered in such multistage decision methods. In the process, we also identify a uniform improvement to confidence intervals constructed in this post-selection inference framework. Under certain rate assumptions, we provide theoretical results that demonstrate the validity of confidence regions and tests constructed from our proposed procedure. The performance of our method is compared to the Selective Inference framework through simulation studies, demonstrating the strengths of our procedure and its applicability to multiple selection mechanisms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题