论文标题
Lidarmultinet:迈向统一的多任务网络,用于激光雷达感知
LidarMultiNet: Towards a Unified Multi-Task Network for LiDAR Perception
论文作者
论文摘要
基于激光雷达的3D对象检测,语义细分和泛型分割通常在具有独特架构的专用网络中实现,这些架构很难相互适应。本文介绍了Lidarmultinet,这是一个基于激光雷达的多任务网络,统一了这三个主要的激光感知任务。在其许多好处中,多任务网络可以通过在多个任务中共享权重和计算来降低整体成本。但是,与独立组合的单任务模型相比,它通常表现不佳。拟议的Lidarmultinet旨在弥合多任务网络和多个单任务网络之间的性能差距。 Lidarmultinet的核心是一个强大的基于3D Voxel的编码器架构,具有全局上下文池(GCP)模块,从激光雷达框架中提取全局上下文特征。特定于任务的头部被添加在网络顶部以执行三个激光痛觉任务。只需添加新的任务特定的头部,而在引入几乎没有额外成本的同时,可以简单地实现更多任务。还提出了第二阶段来完善第一阶段的分割并产生准确的全盘分割结果。 Lidarmultinet在Waymo Open数据集和Nuscenes数据集上进行了广泛的测试,这首先证明了主要的激光雷达感知任务可以统一在一个训练有素的端到端,并实现最先进的性能。值得注意的是,Lidarmultinet在Waymo Open数据集3D语义细分挑战2022中达到了最高的MIOU和最佳准确性,对于测试集中的22个类中的大多数,仅使用LIDAR点作为输入。它还为Waymo 3D对象检测基准和三个Nuscenes基准的单个模型设置了新的最新模型。
LiDAR-based 3D object detection, semantic segmentation, and panoptic segmentation are usually implemented in specialized networks with distinctive architectures that are difficult to adapt to each other. This paper presents LidarMultiNet, a LiDAR-based multi-task network that unifies these three major LiDAR perception tasks. Among its many benefits, a multi-task network can reduce the overall cost by sharing weights and computation among multiple tasks. However, it typically underperforms compared to independently combined single-task models. The proposed LidarMultiNet aims to bridge the performance gap between the multi-task network and multiple single-task networks. At the core of LidarMultiNet is a strong 3D voxel-based encoder-decoder architecture with a Global Context Pooling (GCP) module extracting global contextual features from a LiDAR frame. Task-specific heads are added on top of the network to perform the three LiDAR perception tasks. More tasks can be implemented simply by adding new task-specific heads while introducing little additional cost. A second stage is also proposed to refine the first-stage segmentation and generate accurate panoptic segmentation results. LidarMultiNet is extensively tested on both Waymo Open Dataset and nuScenes dataset, demonstrating for the first time that major LiDAR perception tasks can be unified in a single strong network that is trained end-to-end and achieves state-of-the-art performance. Notably, LidarMultiNet reaches the official 1st place in the Waymo Open Dataset 3D semantic segmentation challenge 2022 with the highest mIoU and the best accuracy for most of the 22 classes on the test set, using only LiDAR points as input. It also sets the new state-of-the-art for a single model on the Waymo 3D object detection benchmark and three nuScenes benchmarks.