HVNET：基于激光雷德的混合体素网络3D对象检测

论文标题

HVNET：基于激光雷德的混合体素网络3D对象检测

HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection

论文作者

Ye, Maosheng, Xu, Shuangjie, Cao, Tongyi

论文摘要

我们提出了混合体素网络（HVNET），这是一个新型的单阶段统一网络，用于基于点云的3D对象检测用于自主驾驶。最近的研究表明，使用Voxel PointNet样式提取器的2D体素化导致大型3D场景的准确有效探测器。由于特征图的大小确定了计算和内存成本，因此体素的大小成为很难平衡的参数。较小的体素尺寸可提供更好的性能，尤其是对于小物体而言，但推理时间更长。较大的体素可以使用较小的特征图覆盖同一区域，但无法捕获复杂的功能和较小物体的准确位置。我们提出了一个混合体素网络，该网络通过在点级别的不同尺度融合不同尺度的体素特征编码器（VFE），并将项目投影到多个伪图像特征图中。我们进一步提出了一个细心的体素特征，该功能编码胜过普通VFE和功能融合金字塔网络，以在功能地图级别汇总多尺度信息。 Kitti基准测试的实验表明，单个HVNET在所有现有方法中以31Hz的实时推理速度获得了最佳地图。

We present Hybrid Voxel Network (HVNet), a novel one-stage unified network for point cloud based 3D object detection for autonomous driving. Recent studies show that 2D voxelization with per voxel PointNet style feature extractor leads to accurate and efficient detector for large 3D scenes. Since the size of the feature map determines the computation and memory cost, the size of the voxel becomes a parameter that is hard to balance. A smaller voxel size gives a better performance, especially for small objects, but a longer inference time. A larger voxel can cover the same area with a smaller feature map, but fails to capture intricate features and accurate location for smaller objects. We present a Hybrid Voxel network that solves this problem by fusing voxel feature encoder (VFE) of different scales at point-wise level and project into multiple pseudo-image feature maps. We further propose an attentive voxel feature encoding that outperforms plain VFE and a feature fusion pyramid network to aggregate multi-scale information at feature map level. Experiments on the KITTI benchmark show that a single HVNet achieves the best mAP among all existing methods with a real time inference speed of 31Hz.

下载PDF全文

下载文献需遵守相关版权规定

论文标题