论文标题
改善基于网络的相对姿势回归的概括:缩小尺寸作为正规化器
Improving the generalization of network based relative pose regression: dimension reduction as a regularizer
论文作者
论文摘要
视觉定位在许多领域(例如增强现实,机器人技术和3D重建)中占有重要地位。最新的视觉定位方法使用RANSAC框架内的基于几何求解器进行姿势估计。但是,这些方法需要在高图像分辨率下进行准确的像素级匹配,这在外观,动态或视角的重大变化下很难满足。基于端到端学习的回归网络提供了一种解决方案,以规避精确像素级的对应关系的要求,但表明对跨刻录概括的性能不佳。在本文中,我们在网络中明确添加了一个可学习的匹配层,以将姿势回归求解器与绝对图像特征值隔离,并在相关特征频道和图像量表上应用维度正则化,以进一步提高概括和大型观点变化的性能。我们在基于两层金字塔的框架内实现了这种维度正则化策略,以回归从粗略到罚款的定位结果。另外,将深度信息融合为绝对翻译量表恢复。通过对现实世界RGBD数据集的实验,我们在提高概括性能和鲁棒性方面验证了设计的有效性,并且还显示了基于回归的视觉本地化网络对基于几何的视觉本地化方法难以进行的挑战场所的潜力。
Visual localization occupies an important position in many areas such as Augmented Reality, robotics and 3D reconstruction. The state-of-the-art visual localization methods perform pose estimation using geometry based solver within the RANSAC framework. However, these methods require accurate pixel-level matching at high image resolution, which is hard to satisfy under significant changes from appearance, dynamics or perspective of view. End-to-end learning based regression networks provide a solution to circumvent the requirement for precise pixel-level correspondences, but demonstrate poor performance towards cross-scene generalization. In this paper, we explicitly add a learnable matching layer within the network to isolate the pose regression solver from the absolute image feature values, and apply dimension regularization on both the correlation feature channel and the image scale to further improve performance towards generalization and large viewpoint change. We implement this dimension regularization strategy within a two-layer pyramid based framework to regress the localization results from coarse to fine. In addition, the depth information is fused for absolute translational scale recovery. Through experiments on real world RGBD datasets we validate the effectiveness of our design in terms of improving both generalization performance and robustness towards viewpoint change, and also show the potential of regression based visual localization networks towards challenging occasions that are difficult for geometry based visual localization methods.