OMNI3D：野外3D对象检测的大型基准和模型

论文标题

OMNI3D：野外3D对象检测的大型基准和模型

Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild

论文作者

Brazil, Garrick, Kumar, Abhinav, Straub, Julian, Ravi, Nikhila, Johnson, Justin, Gkioxari, Georgia

论文摘要

从单个图像中识别3D中的场景和对象是计算机视觉的长期目标，该目标具有机器人技术和AR/VR的应用。对于2D识别，大型数据集和可扩展解决方案已导致前所未有的进步。在3D中，现有的基准尺寸很小，并且方法专门研究几个对象类别和特定域，例如城市驾驶场景。在2D识别的成功中，我们通过引入一个称为Omni3d的大型基准来重新审视3D对象检测的任务。 OMNI3D重新排列并结合了现有数据集，导致234K图像注释了300万以上的实例和98个类别。由于相机内在的差异以及场景和对象类型的丰富多样性的变化，因此在这种规模上进行3D检测具有挑战性。我们提出了一个称为Cube R-CNN的模型，旨在以统一的方法跨相机和场景类型概括。我们表明，Cube R-CNN在较大的OMNI3D和现有基准测试方面的表现优于先前的作品。最后，我们证明Omni3D是一个用于3D对象识别的功能强大的数据集，并表明它可以改善单数据库性能，并可以通过预训练在新的较小数据集上加速学习。

Recognizing scenes and objects in 3D from a single image is a longstanding goal of computer vision with applications in robotics and AR/VR. For 2D recognition, large datasets and scalable solutions have led to unprecedented advances. In 3D, existing benchmarks are small in size and approaches specialize in few object categories and specific domains, e.g. urban driving scenes. Motivated by the success of 2D recognition, we revisit the task of 3D object detection by introducing a large benchmark, called Omni3D. Omni3D re-purposes and combines existing datasets resulting in 234k images annotated with more than 3 million instances and 98 categories. 3D detection at such scale is challenging due to variations in camera intrinsics and the rich diversity of scene and object types. We propose a model, called Cube R-CNN, designed to generalize across camera and scene types with a unified approach. We show that Cube R-CNN outperforms prior works on the larger Omni3D and existing benchmarks. Finally, we prove that Omni3D is a powerful dataset for 3D object recognition and show that it improves single-dataset performance and can accelerate learning on new smaller datasets via pre-training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题