论文标题

首先去,然后是探索后:内在动机探索后的好处

First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation

论文作者

Yang, Zhao, Moerland, Thomas M., Preuss, Mike, Plaat, Aske

论文摘要

Go-explore在具有稀疏奖励的具有挑战性的加强学习(RL)任务上取得了突破性的表现。 Go-explore的关键见解是,成功的探索要求代理商首先返回一个有趣的状态(“ Go”),然后才探索未知的地形(“ Explore”)。在将目标实现后,我们将这种探索称为“探索后”。在本文中,我们在一般具有内在动机的目标探索过程(IMGEP)框架中进行了清楚的消融研究,即Go-explore纸没有显示。我们通过在离散导航和连续控制任务的表格和深度RL设置下在同一算法和深度RL设置下在同一算法中打开和关闭探索后的孤立潜力。在一系列Miligrid和Mujoco环境中进行的实验表明,探索后确实有助于IMGEP代理人达到更多样化的状态并提高其性能。简而言之,我们的工作表明,RL研究人员应在可能的情况下考虑使用IMGEP中的探索后,因为它是有效的,方法不可能且易于实施的。

Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper, we present a clear ablation study of post-exploration in a general intrinsically motivated goal exploration process (IMGEP) framework, that the Go-Explore paper did not show. We study the isolated potential of post-exploration, by turning it on and off within the same algorithm under both tabular and deep RL settings on both discrete navigation and continuous control tasks. Experiments on a range of MiniGrid and Mujoco environments show that post-exploration indeed helps IMGEP agents reach more diverse states and boosts their performance. In short, our work suggests that RL researchers should consider to use post-exploration in IMGEP when possible since it is effective, method-agnostic and easy to implement.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源