通过少量场景区域分类的视觉定位

论文标题

通过少量场景区域分类的视觉定位

Visual Localization via Few-Shot Scene Region Classification

论文作者

Dong, Siyan, Wang, Shuzhe, Zhuang, Yixin, Kannala, Juho, Pollefeys, Marc, Chen, Baoquan

论文摘要

Visual（Re）本地化解决了估计已知场景中捕获的查询图像的6-DOF（自由度）摄像头的问题，这是许多计算机视觉和机器人应用应用程序的关键构建块。基于结构的本地化的最新进展通过记住从图像像素到场景坐标的映射与神经网络的映射来构建相机姿势优化的2D-3D对应关系。但是，这种记忆需要在每个场景中训练大量的图像，这是沉重和效率低下的。相反，通常很少的图像足以覆盖场景的主要区域，以便人类操作员执行视觉定位。在本文中，我们提出了一种场景区域分类方法，以使用几乎没有拍摄的图像来实现快速有效的场景记忆。我们的见解是利用a）预测的特征提取器，b）场景区域分类器和c）元学习策略，以加速培训，同时缓解过度拟合。我们在室内和室外基准上都评估了我们的方法。该实验验证了我们方法在几次环境中的有效性，并且训练时间大大减少到只有几分钟。代码可用：\ url {https://github.com/siyandong/src}

Visual (re)localization addresses the problem of estimating the 6-DoF (Degree of Freedom) camera pose of a query image captured in a known scene, which is a key building block of many computer vision and robotics applications. Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates with neural networks to build 2D-3D correspondences for camera pose optimization. However, such memorization requires training by amounts of posed images in each scene, which is heavy and inefficient. On the contrary, few-shot images are usually sufficient to cover the main regions of a scene for a human operator to perform visual localization. In this paper, we propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images. Our insight is leveraging a) pre-learned feature extractor, b) scene region classifier, and c) meta-learning strategy to accelerate training while mitigating overfitting. We evaluate our method on both indoor and outdoor benchmarks. The experiments validate the effectiveness of our method in the few-shot setting, and the training time is significantly reduced to only a few minutes. Code available at: \url{https://github.com/siyandong/SRC}

下载PDF全文

下载文献需遵守相关版权规定

论文标题