论文标题
骑行:为程序生成的环境奖励以冲击驱动的探索
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
论文作者
论文摘要
稀疏奖励环境中的探索仍然是无模型增强学习的主要挑战之一。许多最先进的方法不仅依靠环境提供的外部奖励,而是使用内在奖励来鼓励探索。但是,我们表明,在程序生成的环境中,现有方法不太可能访问州不止一次。我们提出了一种新型的内在奖励类型,该奖励鼓励代理采取行动,从而导致其学识渊博的状态代表性发生重大变化。我们评估了Minigrid中多个具有挑战性的程序生成任务的方法,以及在先前工作中使用的高维度观察的任务。我们的实验表明,这种方法比现有的探索方法更有效,尤其是对于程序生成的巨质环境。此外,我们分析了我们的代理商获得的学习行为以及固有的奖励。与以前的方法相反,我们的内在奖励不会在训练过程中减少,并且它可以使代理更多地奖励与其可以控制的对象互动。
Exploration in sparse reward environments remains one of the key challenges of model-free reinforcement learning. Instead of solely relying on extrinsic rewards provided by the environment, many state-of-the-art methods use intrinsic rewards to encourage exploration. However, we show that existing methods fall short in procedurally-generated environments where an agent is unlikely to visit a state more than once. We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation. We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid, as well as on tasks with high-dimensional observations used in prior work. Our experiments demonstrate that this approach is more sample efficient than existing exploration methods, particularly for procedurally-generated MiniGrid environments. Furthermore, we analyze the learned behavior as well as the intrinsic reward received by our agent. In contrast to previous approaches, our intrinsic reward does not diminish during the course of training and it rewards the agent substantially more for interacting with objects that it can control.