深度灌注：多模式3D对象检测的LIDAR-CAMERA深融合

论文标题

深度灌注：多模式3D对象检测的LIDAR-CAMERA深融合

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

论文作者

Li, Yingwei, Yu, Adams Wei, Meng, Tianjian, Caine, Ben, Ngiam, Jiquan, Peng, Daiyi, Shen, Junyang, Wu, Bo, Lu, Yifeng, Zhou, Denny, Le, Quoc V., Yuille, Alan, Tan, Mingxing

论文摘要

激光镜头和摄像头是关键传感器，可为自动驾驶中的3D检测提供互补信息。虽然普遍的多模式方法只是用相机功能装饰了原始的LIDAR点云，并将其直接馈送到现有的3D检测模型中，但我们的研究表明，将带有深色激光雷达功能而不是原始点的相机功能融合可以带来更好的性能。但是，由于这些特征通常会得到增强和汇总，因此融合的关键挑战是如何有效地对准两种模式转换的特征。在本文中，我们提出了两种新型技术：逆向与几何相关的增强，例如旋转，以实现激光雷达点和图像像素之间的准确几何对齐，并在融合过程中利用了交叉注意力的交叉捕获相关性。基于InverseAug和LearnableAlign，我们开发了一个名为DeepFusion的通用多模式3D检测模型的家族，该模型比以前的方法更准确。例如，DeepFusion分别改善了6.7、8.9和6.2级APH的行人检测上的尖柱，中心点和3D-Man基线。值得注意的是，我们的模型在Waymo打开数据集上实现了最先进的性能，并针对输入损坏和分发数据显示出强大的模型鲁棒性。代码将在https://github.com/tensorflow/lingvo/tree/master/master/lingvo/上公开获取。

Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving. While prevalent multi-modal methods simply decorate raw lidar point clouds with camera features and feed them directly to existing 3D detection models, our study shows that fusing camera features with deep lidar features instead of raw points, can lead to better performance. However, as those features are often augmented and aggregated, a key challenge in fusion is how to effectively align the transformed features from two modalities. In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e.g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion. Based on InverseAug and LearnableAlign, we develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods. For example, DeepFusion improves PointPillars, CenterPoint, and 3D-MAN baselines on Pedestrian detection for 6.7, 8.9, and 6.2 LEVEL_2 APH, respectively. Notably, our models achieve state-of-the-art performance on Waymo Open Dataset, and show strong model robustness against input corruptions and out-of-distribution data. Code will be publicly available at https://github.com/tensorflow/lingvo/tree/master/lingvo/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题