学会从像素和自然语言说明中求解体素构建具体任务

论文标题

学会从像素和自然语言说明中求解体素构建具体任务

Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions

论文作者

Skrynnik, Alexey, Volovikova, Zoya, Côté, Marc-Alexandre, Voronov, Anton, Zholus, Artem, Arabzadeh, Negar, Mohanty, Shrestha, Teruel, Milagro, Awadallah, Ahmed, Panov, Aleksandr, Burtsev, Mikhail, Kiseleva, Julia

论文摘要

采用预训练的语言模型来制定体现代理的行动计划是一种有希望的研究策略。但是，在真实或模拟环境中执行指令需要验证操作的可行性及其与目标完成的相关性。我们提出了一种新方法，将语言模型和强化学习结合在一起，以根据自然语言说明在类似于我的Minecraft环境中构建对象的任务。我们的方法首先从指令中生成一组一致可实现的子目标，然后通过预先训练的RL策略完成相关的子任务。提出的方法在IGLU 2022竞赛中形成了RL基线。

The adoption of pre-trained language models to generate action plans for embodied agents is a promising research strategy. However, execution of instructions in real or simulated environments requires verification of the feasibility of actions as well as their relevance to the completion of a goal. We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment according to the natural language instructions. Our method first generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy. The proposed method formed the RL baseline at the IGLU 2022 competition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题