论文标题

爆米花:部分观察到的预测限制了强化学习

POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning

论文作者

Futoma, Joseph, Hughes, Michael C., Doshi-Velez, Finale

论文摘要

许多医疗决策任务可以作为马尔可夫决策过程(POMDP)的部分观察到。但是,盛行的两阶段方法首先学习POMDP,然后解决该方法通常会失败,因为最适合数据的模型可能不适合计划。我们介绍了一个新的优化目标,即即使某些观察结果与计划无关,(a)即使在医疗保健中典型的批准非政策环境中也这样做,即即使某些观察结果与计划无关,也可以产生高性能的政策和高质量的生成模型。我们展示了我们的合成示例和充满挑战的医疗决策问题的方法。

Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源