首先去，然后是探索后：内在动机探索后的好处

论文标题

首先去，然后是探索后：内在动机探索后的好处

First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation

论文作者

Yang, Zhao, Moerland, Thomas M., Preuss, Mike, Plaat, Aske

论文摘要

Go-explore在具有稀疏奖励的具有挑战性的加强学习（RL）任务上取得了突破性的表现。 Go-explore的关键见解是，成功的探索要求代理商首先返回一个有趣的状态（“ Go”），然后才探索未知的地形（“ Explore”）。在将目标实现后，我们将这种探索称为“探索后”。在本文中，我们在一般具有内在动机的目标探索过程（IMGEP）框架中进行了清楚的消融研究，即Go-explore纸没有显示。我们通过在离散导航和连续控制任务的表格和深度RL设置下在同一算法和深度RL设置下在同一算法中打开和关闭探索后的孤立潜力。在一系列Miligrid和Mujoco环境中进行的实验表明，探索后确实有助于IMGEP代理人达到更多样化的状态并提高其性能。简而言之，我们的工作表明，RL研究人员应在可能的情况下考虑使用IMGEP中的探索后，因为它是有效的，方法不可能且易于实施的。

Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper, we present a clear ablation study of post-exploration in a general intrinsically motivated goal exploration process (IMGEP) framework, that the Go-Explore paper did not show. We study the isolated potential of post-exploration, by turning it on and off within the same algorithm under both tabular and deep RL settings on both discrete navigation and continuous control tasks. Experiments on a range of MiniGrid and Mujoco environments show that post-exploration indeed helps IMGEP agents reach more diverse states and boosts their performance. In short, our work suggests that RL researchers should consider to use post-exploration in IMGEP when possible since it is effective, method-agnostic and easy to implement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题