基于模型的强化学习，用于具有神经odes的半马尔可夫决策过程

论文标题

基于模型的强化学习，用于具有神经odes的半马尔可夫决策过程

Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs

论文作者

Du, Jianzhun, Futoma, Joseph, Doshi-Velez, Finale

论文摘要

我们使用神经普通微分方程（ODES）在一个新型的基于模型的强化学习（RL）框架（SMDP）中提出了两个用于建模连续时间动力学的优雅解决方案。我们的模型可以准确地表征连续的时间动力学，并使我们能够使用少量数据制定高性能的策略。我们还开发了一种基于模型的方法来优化时间表，以降低与环境的交互率，同时保持近乎最佳的性能，这对于无模型方法是不可能的。我们通过实验证明了我们在各种连续时间域中方法的功效。

We present two elegant solutions for modeling continuous-time dynamics, in a novel model-based reinforcement learning (RL) framework for semi-Markov decision processes (SMDPs), using neural ordinary differential equations (ODEs). Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data. We also develop a model-based approach for optimizing time schedules to reduce interaction rates with the environment while maintaining the near-optimal performance, which is not possible for model-free methods. We experimentally demonstrate the efficacy of our methods across various continuous-time domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题