论文标题
通过从虚拟世界中学习来迈向一致一致的单眼视觉探针计
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World
论文作者
论文摘要
单眼视觉进程(VO)通过提供具有成本效益的相机图像的实时车辆运动引起了广泛的研究关注。但是,基于最新优化的单眼VO方法遇到了长期预测的规模不一致问题。最近引入了深度学习来解决这个问题,通过利用培训数据集中的立体声序列或基础真相动作。但是,它以额外的成本进行数据收集,并且此类培训数据在所有数据集中都不可用。在这项工作中,我们提出了VRVO,这是一个新颖的框架,用于从虚拟数据中检索绝对规模,可以从现代模拟环境中轻松获得,而在实际领域中,在训练或推理阶段中不需要立体声或地面真实数据。具体来说,我们首先使用单眼真实图像和立体声虚拟数据训练一个比例感知的差异网络。通过使用对抗性训练策略将来自两个域的图像映射到共享特征空间中,可以桥接虚拟到真实的域间隙。然后,通过构建一个虚拟立体声目标来确保长期轨迹的比例一致性,将最终的比例一致性差异与直接VO系统集成在一起。此外,为了解决由单独的优化后端和学习过程引起的次优问题,我们进一步提出了一条相互加固管道,该管道允许学习和优化之间的双向信息流,从而促进了彼此的鲁棒性和准确性。我们演示了我们在Kitti和Vkitti2数据集上的框架的有效性。
Monocular visual odometry (VO) has attracted extensive research attention by providing real-time vehicle motion from cost-effective camera images. However, state-of-the-art optimization-based monocular VO methods suffer from the scale inconsistency problem for long-term predictions. Deep learning has recently been introduced to address this issue by leveraging stereo sequences or ground-truth motions in the training dataset. However, it comes at an additional cost for data collection, and such training data may not be available in all datasets. In this work, we propose VRVO, a novel framework for retrieving the absolute scale from virtual data that can be easily obtained from modern simulation environments, whereas in the real domain no stereo or ground-truth data are required in either the training or inference phases. Specifically, we first train a scale-aware disparity network using both monocular real images and stereo virtual data. The virtual-to-real domain gap is bridged by using an adversarial training strategy to map images from both domains into a shared feature space. The resulting scale-consistent disparities are then integrated with a direct VO system by constructing a virtual stereo objective that ensures the scale consistency over long trajectories. Additionally, to address the suboptimality issue caused by the separate optimization backend and the learning process, we further propose a mutual reinforcement pipeline that allows bidirectional information flow between learning and optimization, which boosts the robustness and accuracy of each other. We demonstrate the effectiveness of our framework on the KITTI and vKITTI2 datasets.