OST：3D单一对象跟踪在点云中有效的一流网络

论文标题

OST：3D单一对象跟踪在点云中有效的一流网络

OST: Efficient One-stream Network for 3D Single Object Tracking in Point Clouds

论文作者

Zhao, Xiantong, Han, Yinan, Tian, Shengjing, Liu, Jian, Liu, Xiuping

论文摘要

尽管最近基于暹罗网络的跟踪器对激光雷德点云中的单个对象跟踪实现了令人印象深刻的感知准确性，但它们通常利用繁重的相关操作仅捕获类别级别的特征，并且忽略了与多个对象跟踪相反的任意性的固有优点。在这项工作中，我们提出了一个具有实例级编码强度的根本新颖的一流网络，该网络避免了以前的暹罗网络中发生的相关操作，从而大大降低了计算工作。特别是，所提出的方法主要由模板感知变压器模块（TTM）和一个多尺度特征聚合（MFA）模块组成，能够融合空间和语义信息。 TTM将指定的模板和搜索区域缝合在一起，并利用注意机制来建立信息流，打破了独立\ textIt {提取和相关的先前模式。结果，该模块可以直接生成适合目标的任意和不断变化的模板感知功能，从而使模型能够处理看不见的类别。此外，提出了MFA彼此互补的空间和语义信息，其特征是反向方向特征传播，这些传播汇总了从浅层到深层的信息。对Kitti和Nuscenes的广泛实验表明，我们的方法不仅在特定于类的跟踪方面，而且对于较小的计算和较高效率的类别跟踪，还取得了相当大的性能。

Although recent Siamese network-based trackers have achieved impressive perceptual accuracy for single object tracking in LiDAR point clouds, they usually utilized heavy correlation operations to capture category-level characteristics only, and overlook the inherent merit of arbitrariness in contrast to multiple object tracking. In this work, we propose a radically novel one-stream network with the strength of the instance-level encoding, which avoids the correlation operations occurring in previous Siamese network, thus considerably reducing the computational effort. In particular, the proposed method mainly consists of a Template-aware Transformer Module (TTM) and a Multi-scale Feature Aggregation (MFA) module capable of fusing spatial and semantic information. The TTM stitches the specified template and the search region together and leverages an attention mechanism to establish the information flow, breaking the previous pattern of independent \textit{extraction-and-correlation}. As a result, this module makes it possible to directly generate template-aware features that are suitable for the arbitrary and continuously changing nature of the target, enabling the model to deal with unseen categories. In addition, the MFA is proposed to make spatial and semantic information complementary to each other, which is characterized by reverse directional feature propagation that aggregates information from shallow to deep layers. Extensive experiments on KITTI and nuScenes demonstrate that our method has achieved considerable performance not only for class-specific tracking but also for class-agnostic tracking with less computation and higher efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题