子指导意识和语言导航

论文标题

子指导意识和语言导航

Sub-Instruction Aware Vision-and-Language Navigation

论文作者

Hong, Yicong, Rodriguez-Opazo, Cristian, Wu, Qi, Gould, Stephen

论文摘要

视觉和语言导航要求代理按照自然语言说明在实际的3D环境中导航。尽管进步很大，但以前的作品很少能够完全利用视觉和文本序列之间的强烈对应关系。同时，由于缺乏中间监督，代理商在导航期间无法评估遵循指令的每个部分的性能。在这项工作中，我们专注于视觉和语言序列的粒度以及通过完成指导完成的代理商的可追溯性。我们在训练过程中为代理提供了细粒度的注释，并发现他们能够更好地遵循指令，并在测试时有更高的机会达到目标。我们通过子教室及其相应的路径丰富了基准数据集室内对房间（R2R）。为了利用这些数据，我们提出了有效的子指导关注和转移模块，这些模块会在每个时间步骤中选择并参加单个子教学。我们在四个最先进的代理中实施了子指导模块，与他们的基线模型相比，并表明我们提出的方法改善了所有四个代理的性能。我们在https://github.com/yiciconghong/fine-graining-r2r上发布了细粒R2R数据集（FGR2R）和代码。

Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions. Despite significant advances, few previous works are able to fully utilize the strong correspondence between the visual and textual sequences. Meanwhile, due to the lack of intermediate supervision, the agent's performance at following each part of the instruction cannot be assessed during navigation. In this work, we focus on the granularity of the visual and language sequences as well as the traceability of agents through the completion of an instruction. We provide agents with fine-grained annotations during training and find that they are able to follow the instruction better and have a higher chance of reaching the target at test time. We enrich the benchmark dataset Room-to-Room (R2R) with sub-instructions and their corresponding paths. To make use of this data, we propose effective sub-instruction attention and shifting modules that select and attend to a single sub-instruction at each time-step. We implement our sub-instruction modules in four state-of-the-art agents, compare with their baseline models, and show that our proposed method improves the performance of all four agents. We release the Fine-Grained R2R dataset (FGR2R) and the code at https://github.com/YicongHong/Fine-Grained-R2R.

下载PDF全文

下载文献需遵守相关版权规定

论文标题