机内语音允许零射击任务获取

论文标题

机内语音允许零射击任务获取

Intra-agent speech permits zero-shot task acquisition

论文作者

Yan, Chen, Carnevale, Federico, Georgiev, Petko, Santoro, Adam, Guy, Aurelia, Muldal, Alistair, Hung, Chia-Chun, Abramson, Josh, Lillicrap, Timothy, Wayne, Gregory

论文摘要

人类语言学习者暴露于信息丰富的上下文敏感语言，但要大量的原始感觉数据。通过社会语言的使用和彩排和实践的内部过程，语言学习者能够建立高级的语义表示，以解释其看法。在这里，我们从人类中的“内部语音”过程中汲取灵感（Vygotsky，1934年），以更好地理解代理内语言在体现行为中的作用。首先，我们正式将代理语音作为半监督问题，并开发了两种算法，可以使用几乎没有标记的语言数据的视觉接地字幕。然后，我们通过实验计算不同量的标记数据的缩放曲线，并将数据效率与监督学习基线进行比较。最后，我们将演讲内的语音纳入3D虚拟世界中运行的体现的移动操纵剂代理中，并表明，只需多达150个附加图像标题，代理语音就可以使代理人能够操纵并回答有关新对象的问题，而无需任何相关任务经验（零射击）。综上所述，我们的实验表明，对代理内语音进行建模有效，可以使体现的代理有效地学习新任务，而无需直接互动经验。

Human language learners are exposed to a trickle of informative, context-sensitive language, but a flood of raw sensory data. Through both social language use and internal processes of rehearsal and practice, language learners are able to build high-level, semantic representations that explain their perceptions. Here, we take inspiration from such processes of "inner speech" in humans (Vygotsky, 1934) to better understand the role of intra-agent speech in embodied behavior. First, we formally pose intra-agent speech as a semi-supervised problem and develop two algorithms that enable visually grounded captioning with little labeled language data. We then experimentally compute scaling curves over different amounts of labeled data and compare the data efficiency against a supervised learning baseline. Finally, we incorporate intra-agent speech into an embodied, mobile manipulator agent operating in a 3D virtual world, and show that with as few as 150 additional image captions, intra-agent speech endows the agent with the ability to manipulate and answer questions about a new object without any related task-directed experience (zero-shot). Taken together, our experiments suggest that modelling intra-agent speech is effective in enabling embodied agents to learn new tasks efficiently and without direct interaction experience.

下载PDF全文

下载文献需遵守相关版权规定

论文标题