自节奏的深度加强学习

论文标题

自节奏的深度加强学习

Self-Paced Deep Reinforcement Learning

论文作者

Klink, Pascal, D'Eramo, Carlo, Peters, Jan, Pajarinen, Joni

论文摘要

课程增强学习（CRL）通过将其暴露于整个学习过程中的一系列任务中，从而提高了代理的学习速度和稳定性。尽管取得了经验成功，但CRL中的一个空旷的问题是如何自动为给定的加固学习（RL）代理生成课程，避免手动设计。在本文中，我们通过将课程生成解释为推理问题，提出一个答案，在该问题中，逐渐学习了任务的分布以处理目标任务。这种方法导致自动课程生成，其速度由代理控制，具有坚实的理论动机，并很容易与深RL算法整合在一起。在进行的实验中，使用拟议算法生成的课程可显着提高多种环境和深度RL算法的学习性能，匹配或胜过现有的CRL算法。

Curriculum reinforcement learning (CRL) improves the learning speed and stability of an agent by exposing it to a tailored series of tasks throughout learning. Despite empirical successes, an open question in CRL is how to automatically generate a curriculum for a given reinforcement learning (RL) agent, avoiding manual design. In this paper, we propose an answer by interpreting the curriculum generation as an inference problem, where distributions over tasks are progressively learned to approach the target task. This approach leads to an automatic curriculum generation, whose pace is controlled by the agent, with solid theoretical motivation and easily integrated with deep RL algorithms. In the conducted experiments, the curricula generated with the proposed algorithm significantly improve learning performance across several environments and deep RL algorithms, matching or outperforming state-of-the-art existing CRL algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题