乐观的代理：更成功的视觉导航的准确基于图的价值估计

论文标题

乐观的代理：更成功的视觉导航的准确基于图的价值估计

Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation

论文作者

Moghaddam, Mahdi Kazemi, Wu, Qi, Abbasnejad, Ehsan, Shi, Javen Qinfeng

论文摘要

即使在看不见的环境中，我们人类也可以无可挑剔地搜索目标对象。我们认为，这种能力很大程度上是由于三个主要原因：先验知识（或经验），使用观察到的视觉提示将其适应新环境，并且最重要的是在不早点放弃的情况下进行乐观的搜索。目前，基于强化学习（RL）的最新视觉导航方法中缺少这一点。在本文中，我们建议使用对相对对象位置的外部学习的先验知识，并通过构造神经图将其集成到我们的模型中。为了在不增加状态空间复杂性的情况下有效地合并图形，我们提出了基于图的值估计（GVE）模块。 GVE提供了更准确的基线，用于估计参与者批评RL算法中的优势函数。这导致价值估计误差减少，从而融合了更最佳的策略。通过实证研究，我们表明，我们的代理被称为乐观的代理，在导航发作期间对状态价值具有更现实的估计，从而导致更高的成功率。我们广泛的消融研究表明，我们的简单方法的功效，该方法实现了通过常规视觉导航指标（例如在AI2THOR环境中，成功率（SR）和成功率由路径长度（SPL）加权。

We humans can impeccably search for a target object, given its name only, even in an unseen environment. We argue that this ability is largely due to three main reasons: the incorporation of prior knowledge (or experience), the adaptation of it to the new environment using the observed visual cues and most importantly optimistically searching without giving up early. This is currently missing in the state-of-the-art visual navigation methods based on Reinforcement Learning (RL). In this paper, we propose to use externally learned prior knowledge of the relative object locations and integrate it into our model by constructing a neural graph. In order to efficiently incorporate the graph without increasing the state-space complexity, we propose our Graph-based Value Estimation (GVE) module. GVE provides a more accurate baseline for estimating the Advantage function in actor-critic RL algorithm. This results in reduced value estimation error and, consequently, convergence to a more optimal policy. Through empirical studies, we show that our agent, dubbed as the optimistic agent, has a more realistic estimate of the state value during a navigation episode which leads to a higher success rate. Our extensive ablation studies show the efficacy of our simple method which achieves the state-of-the-art results measured by the conventional visual navigation metrics, e.g. Success Rate (SR) and Success weighted by Path Length (SPL), in AI2THOR environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题