论文标题
学会从像素和自然语言说明中求解体素构建具体任务
Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions
论文作者
论文摘要
采用预训练的语言模型来制定体现代理的行动计划是一种有希望的研究策略。但是,在真实或模拟环境中执行指令需要验证操作的可行性及其与目标完成的相关性。我们提出了一种新方法,将语言模型和强化学习结合在一起,以根据自然语言说明在类似于我的Minecraft环境中构建对象的任务。我们的方法首先从指令中生成一组一致可实现的子目标,然后通过预先训练的RL策略完成相关的子任务。提出的方法在IGLU 2022竞赛中形成了RL基线。
The adoption of pre-trained language models to generate action plans for embodied agents is a promising research strategy. However, execution of instructions in real or simulated environments requires verification of the feasibility of actions as well as their relevance to the completion of a goal. We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment according to the natural language instructions. Our method first generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy. The proposed method formed the RL baseline at the IGLU 2022 competition.