您仅标记一次：3D盒子通过半监督学习从点云改编到图像

论文标题

您仅标记一次：3D盒子通过半监督学习从点云改编到图像

You Only Label Once: 3D Box Adaptation from Point Cloud to Image via Semi-Supervised Learning

论文作者

Shi, Jieqi, Li, Peiliang, Chen, Xiaozhi, Shen, Shaojie

论文摘要

基于图像的3D对象检测任务期望预测的3D边界框具有``紧密度''的投影（也称为Cuboid），该框非常适合对象轮廓在图像上，同时仍将几何属性保持在3D空间上，例如物理维度，成对的正ogonal和Optise Orthise Suppertions等。这些要求带来了巨大的挑战。只需将激光标记的3D框投射到图像上，就会导致非平凡的未对准，而直接在图像上绘制Cuboid则无法访问原始的3D信息。在这项工作中，我们提出了一种基于学习的3D盒改编方法，该方法会自动调整360 $^{\ circ} $ LIDAR 3D边界框的最小参数，以完美适合全景相机的图像外观。在训练阶段，只有几个2D框注释作为指导，我们的网络可以使用LIDAR盒中的3D属性产生准确的图像级别的Cuboid注释。我们称我们的方法为``您只会标记一次''，这意味着在点云上标记一次，并自动适应所有周围的摄像机。据我们所知，我们是第一个专注于图像级别的长方体细化的人，它可以很好地平衡准确性和效率，并大大减少了标签工作，以进行准确的立方体注释。在公共Waymo和Nuscenes数据集上进行的广泛实验表明，我们的方法可以在图像上产生人类水平的立方注释，而无需手动调整。

The image-based 3D object detection task expects that the predicted 3D bounding box has a ``tightness'' projection (also referred to as cuboid), which fits the object contour well on the image while still keeping the geometric attribute on the 3D space, e.g., physical dimension, pairwise orthogonal, etc. These requirements bring significant challenges to the annotation. Simply projecting the Lidar-labeled 3D boxes to the image leads to non-trivial misalignment, while directly drawing a cuboid on the image cannot access the original 3D information. In this work, we propose a learning-based 3D box adaptation approach that automatically adjusts minimum parameters of the 360$^{\circ}$ Lidar 3D bounding box to perfectly fit the image appearance of panoramic cameras. With only a few 2D boxes annotation as guidance during the training phase, our network can produce accurate image-level cuboid annotations with 3D properties from Lidar boxes. We call our method ``you only label once'', which means labeling on the point cloud once and automatically adapting to all surrounding cameras. As far as we know, we are the first to focus on image-level cuboid refinement, which balances the accuracy and efficiency well and dramatically reduces the labeling effort for accurate cuboid annotation. Extensive experiments on the public Waymo and NuScenes datasets show that our method can produce human-level cuboid annotation on the image without needing manual adjustment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题