论文标题
基于模型的强化学习,用于具有神经odes的半马尔可夫决策过程
Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs
论文作者
论文摘要
我们使用神经普通微分方程(ODES)在一个新型的基于模型的强化学习(RL)框架(SMDP)中提出了两个用于建模连续时间动力学的优雅解决方案。我们的模型可以准确地表征连续的时间动力学,并使我们能够使用少量数据制定高性能的策略。我们还开发了一种基于模型的方法来优化时间表,以降低与环境的交互率,同时保持近乎最佳的性能,这对于无模型方法是不可能的。我们通过实验证明了我们在各种连续时间域中方法的功效。
We present two elegant solutions for modeling continuous-time dynamics, in a novel model-based reinforcement learning (RL) framework for semi-Markov decision processes (SMDPs), using neural ordinary differential equations (ODEs). Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data. We also develop a model-based approach for optimizing time schedules to reduce interaction rates with the environment while maintaining the near-optimal performance, which is not possible for model-free methods. We experimentally demonstrate the efficacy of our methods across various continuous-time domains.