抽象的演示和适应性探索，以进行高效且稳定的多步稀疏奖励增强学习

论文标题

抽象的演示和适应性探索，以进行高效且稳定的多步稀疏奖励增强学习

Abstract Demonstrations and Adaptive Exploration for Efficient and Stable Multi-step Sparse Reward Reinforcement Learning

论文作者

Yang, Xintong, Ji, Ze, Wu, Jing, Lai, Yu-kun

论文摘要

尽管深度强化学习（DRL）在包括机器人技术在内的许多学科中都很受欢迎，但最先进的DRL算法仍然难以学习长途，多步骤和稀疏奖励任务，例如只有几个块，只有几个块，只有一个只有任务完成的奖励信号。为了提高此类任务的学习效率，本文提出了一种称为A^2的DRL探索技术，该技术集成了受人类经验启发的两个组成部分：抽象演示和适应性探索。 A^2首先将复杂的任务分解为子任务，然后提供正确的子任务订单以学习。在训练过程中，该代理商会自适应地探索环境，对良好的子任务进行更确定性的作用，而在随机方面更加随机地进行了不良的子任务子任务。消融和比较实验是对几个网格世界任务和三个机器人操纵任务进行的。我们证明A^2可以帮助流行的DRL算法（DQN，DDPG和SAC）在这些环境中更有效，稳定地学习。

Although Deep Reinforcement Learning (DRL) has been popular in many disciplines including robotics, state-of-the-art DRL algorithms still struggle to learn long-horizon, multi-step and sparse reward tasks, such as stacking several blocks given only a task-completion reward signal. To improve learning efficiency for such tasks, this paper proposes a DRL exploration technique, termed A^2, which integrates two components inspired by human experiences: Abstract demonstrations and Adaptive exploration. A^2 starts by decomposing a complex task into subtasks, and then provides the correct orders of subtasks to learn. During training, the agent explores the environment adaptively, acting more deterministically for well-mastered subtasks and more stochastically for ill-learnt subtasks. Ablation and comparative experiments are conducted on several grid-world tasks and three robotic manipulation tasks. We demonstrate that A^2 can aid popular DRL algorithms (DQN, DDPG, and SAC) to learn more efficiently and stably in these environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题