在连续环境中进行视觉和语言导航的SIM-2-SIM转移

论文标题

在连续环境中进行视觉和语言导航的SIM-2-SIM转移

Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments

论文作者

Krantz, Jacob, Lee, Stefan

论文摘要

视觉和语言导航（VLN）的最新工作提出了两个具有不同现实主义的环境范式 - 在抽象的拓扑环境中构建的标准VLN设置，而VLN-CE设置必须使用低级操作导航连续的3D环境。尽管分享了高级任务，甚至分享了基本的指令数据数据，但VLN-CE的性能显着落后于VLN。在这项工作中，我们通过将代理从VLN的抽象环境转移到VLN-CE的连续环境来探讨这一差距。我们发现，这种SIM-2-SIM转移非常有效，在VLN-CE中的先前最新状态提高了成功率 +12％。尽管这证明了这一方向的潜力，但转移并未在抽象环境中完全保留代理的原始性能。我们提出了一系列实验，以确定导致绩效降解的差异，提供了明确的方向以进一步改进。

Recent work in Vision-and-Language Navigation (VLN) has presented two environmental paradigms with differing realism -- the standard VLN setting built on topological environments where navigation is abstracted away, and the VLN-CE setting where agents must navigate continuous 3D environments using low-level actions. Despite sharing the high-level task and even the underlying instruction-path data, performance on VLN-CE lags behind VLN significantly. In this work, we explore this gap by transferring an agent from the abstract environment of VLN to the continuous environment of VLN-CE. We find that this sim-2-sim transfer is highly effective, improving over the prior state of the art in VLN-CE by +12% success rate. While this demonstrates the potential for this direction, the transfer does not fully retain the original performance of the agent in the abstract setting. We present a sequence of experiments to identify what differences result in performance degradation, providing clear directions for further improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题