持续学习控制原始素：通过重置游戏发现技能

论文标题

持续学习控制原始素：通过重置游戏发现技能

Continual Learning of Control Primitives: Skill Discovery via Reset-Games

论文作者

Xu, Kelvin, Verma, Siddharth, Finn, Chelsea, Levine, Sergey

论文摘要

强化学习有可能自动化复杂环境中行为的获取，但是为了成功部署它，必须解决许多实际的挑战。首先，在现实世界设置中，当代理商尝试任务和失败时，环境必须以某种方式“重置”，以便代理可以再次尝试任务。虽然易于模拟，但这可能需要在现实世界中进行大量的人类努力，尤其是在试验数量很大的情况下。其次，现实世界的学习通常涉及复杂的，时间扩展的行为，通常很难随机探索。尽管这两个问题最初可能似乎无关，但在这项工作中，我们展示了一种方法如何允许代理商以最少的监督获得技能，同时消除了重置的需求。我们通过利用这样的见解来做到这一点，即需要将代理“重置”到一组学习任务的广泛的初始状态，这为学习一组“重置技能”提供了自然的环境。我们提出了一种通用游戏公式，以平衡重置和学习技能的目标，并证明这种方法可以提高无重置任务的性能，并表明我们获得的技能可用于显着加速下游学习。

Reinforcement learning has the potential to automate the acquisition of behavior in complex settings, but in order for it to be successfully deployed, a number of practical challenges must be addressed. First, in real world settings, when an agent attempts a task and fails, the environment must somehow "reset" so that the agent can attempt the task again. While easy in simulation, this could require considerable human effort in the real world, especially if the number of trials is very large. Second, real world learning often involves complex, temporally extended behavior that is often difficult to acquire with random exploration. While these two problems may at first appear unrelated, in this work, we show how a single method can allow an agent to acquire skills with minimal supervision while removing the need for resets. We do this by exploiting the insight that the need to "reset" an agent to a broad set of initial states for a learning task provides a natural setting to learn a diverse set of "reset-skills". We propose a general-sum game formulation that balances the objectives of resetting and learning skills, and demonstrate that this approach improves performance on reset-free tasks, and additionally show that the skills we obtain can be used to significantly accelerate downstream learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题