论文标题
爆米花:部分观察到的预测限制了强化学习
POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning
论文作者
论文摘要
许多医疗决策任务可以作为马尔可夫决策过程(POMDP)的部分观察到。但是,盛行的两阶段方法首先学习POMDP,然后解决该方法通常会失败,因为最适合数据的模型可能不适合计划。我们介绍了一个新的优化目标,即即使某些观察结果与计划无关,(a)即使在医疗保健中典型的批准非政策环境中也这样做,即即使某些观察结果与计划无关,也可以产生高性能的政策和高质量的生成模型。我们展示了我们的合成示例和充满挑战的医疗决策问题的方法。
Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.