基于动态内存的好奇心：探索的引导方法

论文标题

基于动态内存的好奇心：探索的引导方法

Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration

论文作者

Gao, Zijian, Li, YiYing, Xu, Kele, Zhai, Yuanzhao, Feng, Dawei, Ding, Bo, Mao, XinJun, Wang, Huaimin

论文摘要

外部奖励的稀疏性对加强学习（RL）构成了严重的挑战。当前，对好奇心已经做出了许多努力，这些努力可以为有效的探索提供代表性的内在奖励。但是，挑战尚未解决。在本文中，我们提出了一个名为Dymecu的RL的新奇好奇心，该RL代表了基于动态记忆的好奇心。受到人类好奇心和信息理论的启发，Dymecu由动态记忆和双重在线学习者组成。好奇心会引起人们的注意，如果记忆的信息无法处理当前状态，并且双重学习者之间的信息差距可以作为对代理的固有奖励进行表述，然后可以将这些状态信息整合到动态内存中。与以前的好奇方法相比，dymecu可以更好地模仿人类的好奇心与动态记忆，并且可以根据双重学习者的自举范例动态地生长内存模块。在包括DeepMind Control Suite和Atari Suite在内的多个基准测试中，进行了大规模的经验实验，结果表明，Dymecu的表现优于有或没有外部奖励的基于好奇心的方法。我们将发布代码以增强可重复性。

The sparsity of extrinsic rewards poses a serious challenge for reinforcement learning (RL). Currently, many efforts have been made on curiosity which can provide a representative intrinsic reward for effective exploration. However, the challenge is still far from being solved. In this paper, we present a novel curiosity for RL, named DyMeCu, which stands for Dynamic Memory-based Curiosity. Inspired by human curiosity and information theory, DyMeCu consists of a dynamic memory and dual online learners. The curiosity arouses if memorized information can not deal with the current state, and the information gap between dual learners can be formulated as the intrinsic reward for agents, and then such state information can be consolidated into the dynamic memory. Compared with previous curiosity methods, DyMeCu can better mimic human curiosity with dynamic memory, and the memory module can be dynamically grown based on a bootstrap paradigm with dual learners. On multiple benchmarks including DeepMind Control Suite and Atari Suite, large-scale empirical experiments are conducted and the results demonstrate that DyMeCu outperforms competitive curiosity-based methods with or without extrinsic rewards. We will release the code to enhance reproducibility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题