视力和语言导航的软专家奖励学习

论文标题

视力和语言导航的软专家奖励学习

Soft Expert Reward Learning for Vision-and-Language Navigation

论文作者

Wang, Hu, Wu, Qi, Shen, Chunhua

论文摘要

视觉和语言导航（VLN）要求代理通过遵循自然语言指令在看不见的环境中找到指定的位置。基于监督学习克隆专家的行为的主要方法，因此在可见的环境上表现更好，同时表现出对看不见的环境的受限绩效。基于强化的学习（RL）模型具有更好的泛化能力，但也有问题，需要大量的手动奖励工程。在本文中，我们介绍了一个软专家奖励学习（SERL）模型，以克服VLN任务的奖励工程设计和概括问题。我们提出的方法由两个互补组成部分组成：软专家蒸馏（SED）模块鼓励代理商尽可能多地表现出专家的行为，但以柔和的方式；自我感知（SP）模块目标是尽快将代理推向最终目的地。从经验上，我们评估了我们的模型对看到，看不见和测试拆分的VLN，并且该模型在大多数评估指标上都优于最新方法。

Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions. Dominant methods based on supervised learning clone expert's behaviours and thus perform better on seen environments, while showing restricted performance on unseen ones. Reinforcement Learning (RL) based models show better generalisation ability but have issues as well, requiring large amount of manual reward engineering is one of which. In this paper, we introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task. Our proposed method consists of two complementary components: Soft Expert Distillation (SED) module encourages agents to behave like an expert as much as possible, but in a soft fashion; Self Perceiving (SP) module targets at pushing the agent towards the final destination as fast as possible. Empirically, we evaluate our model on the VLN seen, unseen and test splits and the model outperforms the state-of-the-art methods on most of the evaluation metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题