像素的深度分层计划

论文标题

像素的深度分层计划

Deep Hierarchical Planning from Pixels

论文作者

Hafner, Danijar, Lee, Kuang-Huei, Fischer, Ian, Abbeel, Pieter

论文摘要

智能代理需要选择较长的动作序列来解决复杂的任务。尽管人类很容易将任务分解为子目标，并通过数百万的肌肉命令将其分解为他们，但现有人工智能仅限于具有数百个决定的任务，尽管预算很大。对分层增强学习的研究旨在克服这一局限性，但事实证明是具有挑战性的，当前的方法依赖于手动指定的目标空间或子任务，并且不存在一般解决方案。我们介绍了导演，这是一种实用方法，可以通过在学习世界模型的潜在空间内直接从像素中学习层次行为。高级政策通过选择潜在目标和低级政策学会实现目标，从而最大程度地提高了任务和探索奖励。尽管在潜在空间中运行，但这些决定还是可以解释的，因为世界模型可以将目标解码为图像以进行可视化。导演的表现优于具有稀疏奖励的任务的探索方法，其中包括来自以自我为中心的相机和本体感受的三倍机器人的3D迷宫遍历，而无需访问先前工作使用的全球位置或自上而下的视图。导演还学习了各种环境的成功行为，包括视觉控制，Atari游戏和DMLAB级别。

Intelligent agents need to select long sequences of actions to solve complex tasks. While humans easily break down tasks into subgoals and reach them through millions of muscle commands, current artificial intelligence is limited to tasks with horizons of a few hundred decisions, despite large compute budgets. Research on hierarchical reinforcement learning aims to overcome this limitation but has proven to be challenging, current methods rely on manually specified goal spaces or subtasks, and no general solution exists. We introduce Director, a practical method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model. The high-level policy maximizes task and exploration rewards by selecting latent goals and the low-level policy learns to achieve the goals. Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization. Director outperforms exploration methods on tasks with sparse rewards, including 3D maze traversal with a quadruped robot from an egocentric camera and proprioception, without access to the global position or top-down view that was used by prior work. Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题