具有层次远期模型的多型胜利称为增强模型

论文标题

具有层次远期模型的多型胜利称为增强模型

Multi-Horizon Representations with Hierarchical Forward Models for Reinforcement Learning

论文作者

McInroe, Trevor, Schäfer, Lukas, Albrecht, Stefano V.

论文摘要

从像素中学习控制很难加强学习（RL）代理，因为代表性学习和政策学习是交织在一起的。以前的方法通过辅助表示任务来解决这个问题，但是他们要么不考虑问题的时间方面，要么仅考虑单步过渡，如果重要的环境变化采取许多步骤，这可能会导致学习效率低下。我们提出了层次结构$ k $ -Step Letent（HKSL），这是一项辅助任务，通过远期模型的层次结构来学习多个表示，这些级别的模型学会学会交流，并且由$ n $ step的批评家组成，所有这些批评者都以各种速度跳过的大幅度运作。我们在有或没有干扰因素和创建任务的30个机器人控制任务的套件中评估HKSL。我们发现，与几种替代表示方法相比，HKSL会收敛到更高或最佳的情节回报。此外，我们发现HKSL的表示形式在跨时间尺度（即使在存在分心者的情况下）准确地捕获了与任务相关的细节，并且层次结构级别之间的通信渠道基于通信过程的两侧组织信息，这两者都提高了样本效率。

Learning control from pixels is difficult for reinforcement learning (RL) agents because representation learning and policy learning are intertwined. Previous approaches remedy this issue with auxiliary representation learning tasks, but they either do not consider the temporal aspect of the problem or only consider single-step transitions, which may cause learning inefficiencies if important environmental changes take many steps to manifest. We propose Hierarchical $k$-Step Latent (HKSL), an auxiliary task that learns multiple representations via a hierarchy of forward models that learn to communicate and an ensemble of $n$-step critics that all operate at varying magnitudes of step skipping. We evaluate HKSL in a suite of 30 robotic control tasks with and without distractors and a task of our creation. We find that HKSL either converges to higher or optimal episodic returns more quickly than several alternative representation learning approaches. Furthermore, we find that HKSL's representations capture task-relevant details accurately across timescales (even in the presence of distractors) and that communication channels between hierarchy levels organize information based on both sides of the communication process, both of which improve sample efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题