迈向长尾3D检测

论文标题

迈向长尾3D检测

Towards Long-Tailed 3D Detection

论文作者

Peri, Neehar, Dave, Achal, Ramanan, Deva, Kong, Shu

论文摘要

当代自动驾驶汽车（AV）基准具有用于培训3D探测器的高级技术，尤其是在大规模激光雷达数据方面。令人惊讶的是，尽管语义类标签自然会遵循长尾巴的分布，但现代基准测试只关注几个常见类（例如，行人和汽车），而忽略了许多罕见的尾巴（例如碎片和杂物）。但是，AVS仍必须检测到罕见的课程以确保安全操作。此外，语义课程通常是在层次结构中组织的，例如，儿童和建筑工人等尾巴类是行人的子类。但是，这种层次关系通常被忽略，这可能会导致误导性能的误导性估计和算法创新的错过的机会。我们通过正式研究长尾3D检测（LT3D）的问题来应对这些挑战，该问题对所有类别（包括尾巴）进行了评估。我们对流行的3D检测代码库进行评估和创新，例如Centerpoint和Pointpillars，将它们适应LT3D。我们开发了层次损失，这些损失促进了跨公共VS稀有类别的功能共享，并改进了对尊重层次结构的“合理”错误的部分信用（例如，将孩子错误地误认为成年人）的检测指标。最后，我们指出，通过与LIDAR的RGB图像的多模式融合，细粒度的尾部类精度得到了特别提高。简而言之，小型细粒类仅凭稀疏（LiDAR）几何形状来识别，这表明多模式提示对于长尾3D检测至关重要。我们的修改平均所有类别的AP提高了5％，并且稀有类别的AP显着改善了AP（例如，婴儿车AP从3.6提高到31.6）！我们的代码可从https://github.com/neeharperi/lt3d获得

Contemporary autonomous vehicle (AV) benchmarks have advanced techniques for training 3D detectors, particularly on large-scale lidar data. Surprisingly, although semantic class labels naturally follow a long-tailed distribution, contemporary benchmarks focus on only a few common classes (e.g., pedestrian and car) and neglect many rare classes in-the-tail (e.g., debris and stroller). However, AVs must still detect rare classes to ensure safe operation. Moreover, semantic classes are often organized within a hierarchy, e.g., tail classes such as child and construction-worker are arguably subclasses of pedestrian. However, such hierarchical relationships are often ignored, which may lead to misleading estimates of performance and missed opportunities for algorithmic innovation. We address these challenges by formally studying the problem of Long-Tailed 3D Detection (LT3D), which evaluates on all classes, including those in-the-tail. We evaluate and innovate upon popular 3D detection codebases, such as CenterPoint and PointPillars, adapting them for LT3D. We develop hierarchical losses that promote feature sharing across common-vs-rare classes, as well as improved detection metrics that award partial credit to "reasonable" mistakes respecting the hierarchy (e.g., mistaking a child for an adult). Finally, we point out that fine-grained tail class accuracy is particularly improved via multimodal fusion of RGB images with LiDAR; simply put, small fine-grained classes are challenging to identify from sparse (lidar) geometry alone, suggesting that multimodal cues are crucial to long-tailed 3D detection. Our modifications improve accuracy by 5% AP on average for all classes, and dramatically improve AP for rare classes (e.g., stroller AP improves from 3.6 to 31.6)! Our code is available at https://github.com/neeharperi/LT3D

下载PDF全文

下载文献需遵守相关版权规定

论文标题