使用检测，跟踪和预测视觉猛击以实现动态场景的实时语义映射

论文标题

使用检测，跟踪和预测视觉猛击以实现动态场景的实时语义映射

Using Detection, Tracking and Prediction in Visual SLAM to Achieve Real-time Semantic Mapping of Dynamic Scenarios

论文作者

Chen, Xingyu, Xue, Jianru, Fang, Jianwu, Pan, Yuxin, Zheng, Nanning

论文摘要

在本文中，我们基于Orb-Slam2提出了一个轻巧的系统RDS-SLAM，该系统可以准确地估算姿势并在对象级别上建立语义图，以实时使用一种常用的Intel Core i7 CPU实时进行动态场景。在RDS-SLAM中，提出了三个重大改进以及重大的建筑修改，以克服ORB-SLAM2的局限性。首先，它在关键帧中采用轻量级对象检测神经网络。其次，将有效的跟踪和预测机制嵌入到系统中，以删除所有传入帧中属于可移动对象的特征点。第三，语义OCTREE图是通过检测和跟踪结果的概率融合而构建的，这使机器人能够在对象级别维护语义描述，以在动态场景中进行潜在的相互作用。我们评估了TUM RGB-D数据集中的RDS-SLAM，实验结果表明，在动态场景中，RDS-SLAM可以使用Intel Core i7 CPU在动态场景中以30.3 ms的速度运行，并且与最先进的SLAM相比，与Intel Core I7 CPU和强大的GPU相比，它可以达到可比的精度。

In this paper, we propose a lightweight system, RDS-SLAM, based on ORB-SLAM2, which can accurately estimate poses and build semantic maps at object level for dynamic scenarios in real time using only one commonly used Intel Core i7 CPU. In RDS-SLAM, three major improvements, as well as major architectural modifications, are proposed to overcome the limitations of ORB-SLAM2. Firstly, it adopts a lightweight object detection neural network in key frames. Secondly, an efficient tracking and prediction mechanism is embedded into the system to remove the feature points belonging to movable objects in all incoming frames. Thirdly, a semantic octree map is built by probabilistic fusion of detection and tracking results, which enables a robot to maintain a semantic description at object level for potential interactions in dynamic scenarios. We evaluate RDS-SLAM in TUM RGB-D dataset, and experimental results show that RDS-SLAM can run with 30.3 ms per frame in dynamic scenarios using only an Intel Core i7 CPU, and achieves comparable accuracy compared with the state-of-the-art SLAM systems which heavily rely on both Intel Core i7 CPUs and powerful GPUs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题