零射门的策略转移，具有元强化学习的分解任务表示形式

论文标题

零射门的策略转移，具有元强化学习的分解任务表示形式

Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning

论文作者

Wu, Zheng, Xie, Yichen, Lian, Wenzhao, Wang, Changhao, Guo, Yanjiang, Chen, Jianyu, Schaal, Stefan, Tomizuka, Masayoshi

论文摘要

人类能够将各种任务抽象为多个属性的不同组合。这种组成性的观点对于人类的快速学习和适应至关重要，因为可以将相关任务的先前经验结合在一起以跨越新的组成环境的概括。在这项工作中，我们旨在通过利用任务组成性来实现强化学习（RL）代理的零射门政策概括。我们提出的方法是一种具有分离的任务表示形式的元算法，明确编码了任务的不同方面。然后，通过在没有额外探索的情况下通过获得的分离来推断看不见的组成任务表示来执行政策概括。该评估是对三个模拟任务和具有挑战性的现实机器人插入任务进行的。实验结果表明，我们提出的方法以零拍的方式实现了政策的概括，以看不见的组成任务。

Humans are capable of abstracting various tasks as different combinations of multiple attributes. This perspective of compositionality is vital for human rapid learning and adaption since previous experiences from related tasks can be combined to generalize across novel compositional settings. In this work, we aim to achieve zero-shot policy generalization of Reinforcement Learning (RL) agents by leveraging the task compositionality. Our proposed method is a meta- RL algorithm with disentangled task representation, explicitly encoding different aspects of the tasks. Policy generalization is then performed by inferring unseen compositional task representations via the obtained disentanglement without extra exploration. The evaluation is conducted on three simulated tasks and a challenging real-world robotic insertion task. Experimental results demonstrate that our proposed method achieves policy generalization to unseen compositional tasks in a zero-shot manner.

下载PDF全文

下载文献需遵守相关版权规定

论文标题