爆米花：部分观察到的预测限制了强化学习

论文标题

爆米花：部分观察到的预测限制了强化学习

POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning

论文作者

Futoma, Joseph, Hughes, Michael C., Doshi-Velez, Finale

论文摘要

许多医疗决策任务可以作为马尔可夫决策过程（POMDP）的部分观察到。但是，盛行的两阶段方法首先学习POMDP，然后解决该方法通常会失败，因为最适合数据的模型可能不适合计划。我们介绍了一个新的优化目标，即即使某些观察结果与计划无关，（a）即使在医疗保健中典型的批准非政策环境中也这样做，即即使某些观察结果与计划无关，也可以产生高性能的政策和高质量的生成模型。我们展示了我们的合成示例和充满挑战的医疗决策问题的方法。

Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题