MOTSLAM：使用单视深度估计的MOT辅助单眼动态大满贯

论文标题

MOTSLAM：使用单视深度估计的MOT辅助单眼动态大满贯

MOTSLAM: MOT-assisted monocular dynamic SLAM using single-view depth estimation

论文作者

Zhang, Hanwei, Uchiyama, Hideaki, Ono, Shintaro, Kawasaki, Hiroshi

论文摘要

靶向静态场景的视觉大满贯系统已经以令人满意的精度和稳健性开发。然后，动态3D对象跟踪已成为视觉猛击的重要能力，需要在各种情况下了解动态环境，包括自主驾驶，增强和虚拟现实。但是，由于难以将动态特征和估算其位置关联的困难，因此仅用单眼图像执行动态大满贯仍然是一个具有挑战性的问题。在本文中，我们提出了Motslam，这是一个动态的视觉大满贯系统，具有单眼配置，可跟踪动态对象的姿势和边界框。 Motslam首先执行多个对象跟踪（MOT），并使用关联的2D和3D边界框检测来创建初始3D对象。然后，采用基于神经网络的单眼深度估计来获取动态特征的深度。最后，使用新颖的捆绑包调整，相机的姿势，对象姿势以及静态的静态和动态地图点共同优化。我们在KITTI数据集上的实验表明，我们的系统在相机自我运动和单眼动态大满贯上都达到了最佳性能。

Visual SLAM systems targeting static scenes have been developed with satisfactory accuracy and robustness. Dynamic 3D object tracking has then become a significant capability in visual SLAM with the requirement of understanding dynamic surroundings in various scenarios including autonomous driving, augmented and virtual reality. However, performing dynamic SLAM solely with monocular images remains a challenging problem due to the difficulty of associating dynamic features and estimating their positions. In this paper, we present MOTSLAM, a dynamic visual SLAM system with the monocular configuration that tracks both poses and bounding boxes of dynamic objects. MOTSLAM first performs multiple object tracking (MOT) with associated both 2D and 3D bounding box detection to create initial 3D objects. Then, neural-network-based monocular depth estimation is applied to fetch the depth of dynamic features. Finally, camera poses, object poses, and both static, as well as dynamic map points, are jointly optimized using a novel bundle adjustment. Our experiments on the KITTI dataset demonstrate that our system has reached best performance on both camera ego-motion and object tracking on monocular dynamic SLAM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题