基于快速模型的政策搜索通用策略网络

论文标题

基于快速模型的政策搜索通用策略网络

Fast Model-based Policy Search for Universal Policy Networks

论文作者

Semage, Buddhika Laknath, Karimpanal, Thommen George, Rana, Santu, Venkatesh, Svetha

论文摘要

将代理商的行为适应新环境一直是基于物理的增强学习的主要重点领域之一。尽管诸如通用策略网络之类的最新方法通过在模拟各种动态/潜在因素上培训的多种策略来部分解决此问题，但有效地确定了针对给定环境的最合适的政策仍然是一个挑战。在这项工作中，我们提出了基于高斯流程的先验在模拟中学习，该学位会捕捉到转移到以前看不见的环境时的政策表现。我们将其与基于贝叶斯优化的策略搜索过程相结合，以提高从通用策略网络中确定最合适的政策的效率。我们在一系列连续和离散的控制环境中经验评估我们的方法，并表明它的表现优于其他竞争基线。

Adapting an agent's behaviour to new environments has been one of the primary focus areas of physics based reinforcement learning. Although recent approaches such as universal policy networks partially address this issue by enabling the storage of multiple policies trained in simulation on a wide range of dynamic/latent factors, efficiently identifying the most appropriate policy for a given environment remains a challenge. In this work, we propose a Gaussian Process-based prior learned in simulation, that captures the likely performance of a policy when transferred to a previously unseen environment. We integrate this prior with a Bayesian Optimisation-based policy search process to improve the efficiency of identifying the most appropriate policy from the universal policy network. We empirically evaluate our approach in a range of continuous and discrete control environments, and show that it outperforms other competing baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题