Arramon：在动态环境中的联合导航组件指令解释任务

论文标题

Arramon：在动态环境中的联合导航组件指令解释任务

ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments

论文作者

Kim, Hyounghun, Zala, Abhay, Burri, Graham, Tan, Hao, Bansal, Mohit

论文摘要

对于体现的代理，导航是一个重要的能力，但不是一个孤立的目标。预计代理在到达目标位置后将执行特定的任务，例如拾取对象并将其组装成特定的安排。我们结合了视觉和语言导航，收集的对象的组装以及对象引用表达理解，以创建一个新颖的联合导航和组装任务，名为Arramon。在此任务中，要求代理（类似于口袋妖怪GO播放器）通过根据复杂，现实的室外环境中的自然语言指令进行导航，以一对一的方式查找和收集不同的目标对象，但随后还将在Egipentric Grid-lay-Layout环境中部分地安排收集的对象。为了支持此任务，我们实现了一个3D动态环境模拟器，并通过人工编写的导航和组装说明以及相应的地面真实轨迹收集数据集（以英语；也扩展到印地语）。我们还通过验证阶段过滤收集的指令，从而导致总共7.7k任务实例（30.8k说明和路径）。我们介绍了几种基线模型（集成和偏见）和指标（NDTW，CTC，RPOD和PTC）的结果，并且较大的模型 - 人类性能差距表明我们的任务具有挑战性，并为未来工作提供了广泛的范围。我们的数据集，模拟器和代码可在以下网址公开获取：https：//arramonunc.github.io

For embodied agents, navigation is an important ability but not an isolated goal. Agents are also expected to perform specific tasks after reaching the target location, such as picking up objects and assembling them into a particular arrangement. We combine Vision-and-Language Navigation, assembling of collected objects, and object referring expression comprehension, to create a novel joint navigation-and-assembly task, named ArraMon. During this task, the agent (similar to a PokeMON GO player) is asked to find and collect different target objects one-by-one by navigating based on natural language instructions in a complex, realistic outdoor environment, but then also ARRAnge the collected objects part-by-part in an egocentric grid-layout environment. To support this task, we implement a 3D dynamic environment simulator and collect a dataset (in English; and also extended to Hindi) with human-written navigation and assembling instructions, and the corresponding ground truth trajectories. We also filter the collected instructions via a verification stage, leading to a total of 7.7K task instances (30.8K instructions and paths). We present results for several baseline models (integrated and biased) and metrics (nDTW, CTC, rPOD, and PTC), and the large model-human performance gap demonstrates that our task is challenging and presents a wide scope for future work. Our dataset, simulator, and code are publicly available at: https://arramonunc.github.io

下载PDF全文

下载文献需遵守相关版权规定

论文标题