论文标题
CXTRACK:通过上下文信息改善3D点云跟踪
CXTrack: Improving 3D Point Cloud Tracking with Contextual Information
论文作者
论文摘要
3D单一对象跟踪在许多应用程序(例如自动驾驶)中起着至关重要的作用。由于外观差异很大以及由遮挡和有限的传感器功能引起的点的稀疏性,这仍然是一个具有挑战性的问题。因此,连续两个帧之间的上下文信息对于有效的对象跟踪至关重要。但是,在现有方法中通常会忽略和裁剪包含这种有用信息的点,从而导致不充分利用重要的上下文知识。为了解决此问题,我们提出了CXTRACK,这是一个基于3D对象跟踪的基于新颖的变压器网络,该网络利用上下文信息以改善跟踪结果。具体而言,我们设计了一个以目标为中心的变压器网络,该网络直接从两个连续的帧和先前的边界框中取点特征作为输入,以探索上下文信息并隐式传播目标提示。为了实现各种大小对象的准确定位,我们提出了一个基于变压器的定位头,其中具有新颖的中心嵌入模块,以将目标与干扰物区分开。在三个大型数据集(Kitti,Nuscenes和Waymo打开数据集)上进行了广泛的实验,表明CXTRACK在34 fps运行时达到了最先进的跟踪性能。
3D single object tracking plays an essential role in many applications, such as autonomous driving. It remains a challenging problem due to the large appearance variation and the sparsity of points caused by occlusion and limited sensor capabilities. Therefore, contextual information across two consecutive frames is crucial for effective object tracking. However, points containing such useful information are often overlooked and cropped out in existing methods, leading to insufficient use of important contextual knowledge. To address this issue, we propose CXTrack, a novel transformer-based network for 3D object tracking, which exploits ConteXtual information to improve the tracking results. Specifically, we design a target-centric transformer network that directly takes point features from two consecutive frames and the previous bounding box as input to explore contextual information and implicitly propagate target cues. To achieve accurate localization for objects of all sizes, we propose a transformer-based localization head with a novel center embedding module to distinguish the target from distractors. Extensive experiments on three large-scale datasets, KITTI, nuScenes and Waymo Open Dataset, show that CXTrack achieves state-of-the-art tracking performance while running at 34 FPS.