Seformer：3D对象检测的结构嵌入变压器

论文标题

Seformer：3D对象检测的结构嵌入变压器

SEFormer: Structure Embedding Transformer for 3D Object Detection

论文作者

Feng, Xiaoyu, Du, Heming, Duan, Yueqi, Liu, Yongpan, Fan, Hehe

论文摘要

有效地保留和编码结构特征是在不规则和稀疏点点中的对象中的对象，这是对点云上3D对象检测的关键挑战。最近，变形金刚在许多2D甚至3D视觉任务上都表现出了有希望的表现。与固定和刚性的卷积内核相比，变压器中的自发机制可以自适应地排除无关或嘈杂的点，因此适合在不规则的LiDar点云中保留局部空间结构。但是，Transformer仅基于自我发项机制就可以对点特征执行简单的总和，并且所有点具有相同的变换的价值。这种各向同性操作缺乏捕获面向方向距离的局部结构的能力，这对于3D对象检测很重要。在这项工作中，我们提出了一个结构插入变压器（Seformer），该变压器不仅可以将本地结构保存为传统变压器，而且还可以编码本地结构。与传统变压器中的自我发挥机制相比，Seformer根据相对方向和查询点的距离学习了不同的特征转换。然后，我们提出了一个基于Seformer的网络，用于高性能3D对象检测。广泛的实验表明，所提出的体系结构可以在Waymo Open DataSet上实现SOTA结果，这是自动驾驶的最大3D检测基准。具体而言，Seformer获得了79.02％的地图，比现有作品高1.2％。我们将发布代码。

Effectively preserving and encoding structure features from objects in irregular and sparse LiDAR points is a key challenge to 3D object detection on point cloud. Recently, Transformer has demonstrated promising performance on many 2D and even 3D vision tasks. Compared with the fixed and rigid convolution kernels, the self-attention mechanism in Transformer can adaptively exclude the unrelated or noisy points and thus suitable for preserving the local spatial structure in irregular LiDAR point cloud. However, Transformer only performs a simple sum on the point features, based on the self-attention mechanism, and all the points share the same transformation for value. Such isotropic operation lacks the ability to capture the direction-distance-oriented local structure which is important for 3D object detection. In this work, we propose a Structure-Embedding transFormer (SEFormer), which can not only preserve local structure as traditional Transformer but also have the ability to encode the local structure. Compared to the self-attention mechanism in traditional Transformer, SEFormer learns different feature transformations for value points based on the relative directions and distances to the query point. Then we propose a SEFormer based network for high-performance 3D object detection. Extensive experiments show that the proposed architecture can achieve SOTA results on Waymo Open Dataset, the largest 3D detection benchmark for autonomous driving. Specifically, SEFormer achieves 79.02% mAP, which is 1.2% higher than existing works. We will release the codes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题