MAFF-NET：通过多模式自适应特征融合的3D车辆检测的滤波器阳性

论文标题

MAFF-NET：通过多模式自适应特征融合的3D车辆检测的滤波器阳性

MAFF-Net: Filter False Positive for 3D Vehicle Detection with Multi-modal Adaptive Feature Fusion

论文作者

Zhang, Zehan, Zhang, Ming, Liang, Zhidong, Zhao, Xian, Yang, Ming, Tan, Wenming, Pu, ShiLiang

论文摘要

基于多模式融合的3D车辆检测是许多应用程序（例如自动驾驶）的重要任务。尽管已经取得了重大进展，但我们仍然观察到需要进一步改进的两个方面：首先，以前的作品很少探索相机图像可以带来3D检测的特定增益。其次，许多融合算法运行缓慢，这对于具有高实时要求（自动驾驶）的应用至关重要。为此，我们提出了本文中端到端可训练的单阶段多模式特征自适应网络，该网络使用图像信息有效地减少了3D检测的假阳性，并且具有快速的检测速度。提出了基于通道注意机制的多模式自适应特征融合模块，以使网络能够自适应地使用每种模态的特征。基于上述机制，提出了两种融合技术以适应不同的用法方案：PointAtteNteDeentionFusion适合过滤简单的假阳性和更快； denseatTentionFusion适合过滤更困难的假阳性，并且具有更好的总体性能。 KITTI数据集的实验结果表明，仅使用点云数据对方法过滤假阳性进行了显着改善。此外，与Kitti基准中发布的最先进的多模式方法相比，提出的方法可以提供竞争结果，并且具有最快的速度。

3D vehicle detection based on multi-modal fusion is an important task of many applications such as autonomous driving. Although significant progress has been made, we still observe two aspects that need to be further improvement: First, the specific gain that camera images can bring to 3D detection is seldom explored by previous works. Second, many fusion algorithms run slowly, which is essential for applications with high real-time requirements(autonomous driving). To this end, we propose an end-to-end trainable single-stage multi-modal feature adaptive network in this paper, which uses image information to effectively reduce false positive of 3D detection and has a fast detection speed. A multi-modal adaptive feature fusion module based on channel attention mechanism is proposed to enable the network to adaptively use the feature of each modal. Based on the above mechanism, two fusion technologies are proposed to adapt to different usage scenarios: PointAttentionFusion is suitable for filtering simple false positive and faster; DenseAttentionFusion is suitable for filtering more difficult false positive and has better overall performance. Experimental results on the KITTI dataset demonstrate significant improvement in filtering false positive over the approach using only point cloud data. Furthermore, the proposed method can provide competitive results and has the fastest speed compared to the published state-of-the-art multi-modal methods in the KITTI benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题