学习遵循基于文本游戏的说明

论文标题

学习遵循基于文本游戏的说明

Learning to Follow Instructions in Text-Based Games

论文作者

Tuli, Mathieu, Li, Andrew C., Vaezipoor, Pashootan, Klassen, Toryn Q., Sanner, Scott, McIlraith, Sheila A.

论文摘要

基于文本的游戏提出了一系列独特的顺序决策问题，其中代理通过通过自然语言传达的动作和观察结果与部分可观察到的模拟环境相互作用。这些观察结果通常包括指令，即在加强学习（RL）设置中，可以直接或间接地指导玩家完成值得奖励的任务。在这项工作中，我们研究RL代理遵循此类说明的能力。我们进行的实验表明，最先进的基于文本的游戏代理的性能在很大程度上不受此类说明的存在或不存在的影响，并且这些代理通常无法执行完成任务。为了进一步研究和解决以下教学的任务，我们以线性时间逻辑（LTL）的形式为RL代理提供了自然语言指令的内部结构化表示，这是一种正式语言，越来越多地用于RL中的时间扩展奖励规范。我们的框架既支持又强调理解指令的时间语义以及衡量实现这种时间扩展行为的进步的好处。在Textworld中使用500多种游戏进行的实验证明了我们方法的出色表现。

Text-based games present a unique class of sequential decision making problem in which agents interact with a partially observable, simulated environment via actions and observations conveyed through natural language. Such observations typically include instructions that, in a reinforcement learning (RL) setting, can directly or indirectly guide a player towards completing reward-worthy tasks. In this work, we study the ability of RL agents to follow such instructions. We conduct experiments that show that the performance of state-of-the-art text-based game agents is largely unaffected by the presence or absence of such instructions, and that these agents are typically unable to execute tasks to completion. To further study and address the task of instruction following, we equip RL agents with an internal structured representation of natural language instructions in the form of Linear Temporal Logic (LTL), a formal language that is increasingly used for temporally extended reward specification in RL. Our framework both supports and highlights the benefit of understanding the temporal semantics of instructions and in measuring progress towards achievement of such a temporally extended behaviour. Experiments with 500+ games in TextWorld demonstrate the superior performance of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题