论文标题
学习功能分解的层次结构,用于连续控制任务与路径计划
Learning Functionally Decomposed Hierarchies for Continuous Control Tasks with Path Planning
论文作者
论文摘要
我们提出了《皮》,这是一种新型的层次增强学习体系结构,成功地解决了长期的地平线控制任务并概括地看不见测试方案。计划和低级控制之间的功能分解是通过明确分开整个层次结构的状态行动空间来实现的,这允许整合每层任务相关的知识。我们提出了一个基于RL的计划者,以有效利用层次结构计划层中的信息,而控制层则学习目标条件条件的控制策略。层次结构是共同训练的,但允许跨不同代理层次结构的政策层模块化转移。我们从实验上表明,与学习和非学习方法相比,我们的方法跨越了看不见的测试环境,并且可以扩展到3倍的地平线长度。我们评估具有稀疏奖励的复杂连续控制任务,包括导航和机器人操纵。
We present HiDe, a novel hierarchical reinforcement learning architecture that successfully solves long horizon control tasks and generalizes to unseen test scenarios. Functional decomposition between planning and low-level control is achieved by explicitly separating the state-action spaces across the hierarchy, which allows the integration of task-relevant knowledge per layer. We propose an RL-based planner to efficiently leverage the information in the planning layer of the hierarchy, while the control layer learns a goal-conditioned control policy. The hierarchy is trained jointly but allows for the modular transfer of policy layers across hierarchies of different agents. We experimentally show that our method generalizes across unseen test environments and can scale to 3x horizon length compared to both learning and non-learning based methods. We evaluate on complex continuous control tasks with sparse rewards, including navigation and robot manipulation.