基准图像检索以进行视觉定位

论文标题

基准图像检索以进行视觉定位

Benchmarking Image Retrieval for Visual Localization

论文作者

Pion, Noé, Humenberger, Martin, Csurka, Gabriela, Cabon, Yohann, Sattler, Torsten

论文摘要

视觉定位，即在已知场景中摄像机姿势估计，是自动驾驶和增强现实等技术的核心组成部分。最新的本地化方法通常依赖于两个任务之一的图像检索技术：（1）提供近似姿势估计值或（2）确定场景的哪些部分在给定的查询图像中可能可见。对于这些任务，使用最先进的图像检索算法是普遍的做法。这些算法通常经过培训，目的是在各种观点变化下检索相同地标的目标。但是，在视觉定位的背景下，对观点变化的鲁棒性不一定是可取的。本文着重理解图像检索在多个视觉本地化任务中的作用。我们介绍了一个基准设置，并在多个数据集上比较了最新的检索表示。我们表明，在经典地标检索/识别任务上的检索性能仅与某些但不适合本地化性能的任务相关。这表明需要专门为本地化任务设计的检索方法。我们的基准和评估协议可在https://github.com/naver/kapture-localization上获得。

Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two tasks: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for these tasks. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes. However, robustness to viewpoint changes is not necessarily desirable in the context of visual localization. This paper focuses on understanding the role of image retrieval for multiple visual localization tasks. We introduce a benchmark setup and compare state-of-the-art retrieval representations on multiple datasets. We show that retrieval performance on classical landmark retrieval/recognition tasks correlates only for some but not all tasks to localization performance. This indicates a need for retrieval approaches specifically designed for localization tasks. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题