迈向安全的实时系统：立体声与图像和底导的3D对象检测

论文标题

迈向安全的实时系统：立体声与图像和底导的3D对象检测

Towards Safe, Real-Time Systems: Stereo vs Images and LiDAR for 3D Object Detection

论文作者

Levine, Matthew

论文摘要

随着对象检测的迅速改善，注意力已扩大了过去仅图像的网络，包括一系列3D和多模式框架，尤其是结合LIDAR的框架。但是，由于成本，物流甚至一些安全考虑，立体声可能是一个有吸引力的选择。为了理解立体声作为对象探测器中单眼输入或激光元的替代的功效，我们表明，具有传统差异算法的多模式学习可以改善基于图像的结果，而无需增加参数的数量，并且在某些情况下，对立体误差的学习可以赋予LIDAR相似的3D本地化功能。此外，这样做在仅图像方法方面也具有校准优势。我们在公共数据集Kitti上进行基准测试，并在此过程中揭示了当前在计算该集合的计算指标中使用的一些小但常见的算法错误，并提供有效的，可证明的正确的替代方案。

As object detectors rapidly improve, attention has expanded past image-only networks to include a range of 3D and multimodal frameworks, especially ones that incorporate LiDAR. However, due to cost, logistics, and even some safety considerations, stereo can be an appealing alternative. Towards understanding the efficacy of stereo as a replacement for monocular input or LiDAR in object detectors, we show that multimodal learning with traditional disparity algorithms can improve image-based results without increasing the number of parameters, and that learning over stereo error can impart similar 3D localization power to LiDAR in certain contexts. Furthermore, doing so also has calibration benefits with respect to image-only methods. We benchmark on the public dataset KITTI, and in doing so, reveal a few small but common algorithmic mistakes currently used in computing metrics on that set, and offer efficient, provably correct alternatives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题