Megapose：通过渲染和比较对新物体的6D姿势估计

论文标题

Megapose：通过渲染和比较对新物体的6D姿势估计

MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare

论文作者

Labbé, Yann, Manuelli, Lucas, Mousavian, Arsalan, Tyree, Stephen, Birchfield, Stan, Tremblay, Jonathan, Carpentier, Justin, Aubry, Mathieu, Fox, Dieter, Sivic, Josef

论文摘要

我们介绍了Megapose，这是一种估计6D姿势的方法，即在训练过程中看不见的对象。在推理时，该方法仅假设（i）在图像中显示对象的感兴趣区域以及（ii）观察到的对象的CAD模型。这项工作的贡献是三倍。首先，我们根据渲染和比较策略提出了6D姿势炼油厂，该策略可以应用于新颖对象。通过呈现对象的CAD模型的多个合成视图，将新对象的形状和坐标系作为网络输入。其次，我们引入了一种新颖的方法进行粗姿势估计方法，该方法利用了经过训练的网络来对合成渲染和观察到的同一对象的观察图像之间的姿势误差进行分类。第三，我们引入了数千个具有不同视觉和形状特性的对象的影像图像的大规模合成数据集，并表明这种多样性对于在新物体上获得良好的概括性性能至关重要。我们在这个大型合成数据集上训练我们的方法，并将其应用于从几个姿势估计基准中的真实图像中的数百个新颖对象进行应用。我们的方法在模型网和YCB-Video数据集上实现了最新性能。对BOP挑战的7个核心数据集进行了广泛的评估表明，我们的方法在训练过程中需要访问目标对象的现有方法实现绩效竞争。代码，数据集和训练有素的模型可在项目页面上提供：https：//megapose6d.github.io/。

We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. The shape and coordinate system of the novel object are provided as inputs to the network by rendering multiple synthetic views of the object's CAD model. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner. Third, we introduce a large-scale synthetic dataset of photorealistic images of thousands of objects with diverse visual and shape properties and show that this diversity is crucial to obtain good generalization performance on novel objects. We train our approach on this large synthetic dataset and apply it without retraining to hundreds of novel objects in real images from several pose estimation benchmarks. Our approach achieves state-of-the-art performance on the ModelNet and YCB-Video datasets. An extensive evaluation on the 7 core datasets of the BOP challenge demonstrates that our approach achieves performance competitive with existing approaches that require access to the target objects during training. Code, dataset and trained models are available on the project page: https://megapose6d.github.io/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题