CPPF：在野外迈向强大的类别级别9D姿势估计

论文标题

CPPF：在野外迈向强大的类别级别9D姿势估计

CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild

论文作者

You, Yang, Shi, Ruoxi, Wang, Weiming, Lu, Cewu

论文摘要

在本文中，我们解决了单个RGB-D框架，解决了野外类别级别9D姿势估计的问题。使用现实世界中9D姿势的监督数据是乏味和错误的，并且也无法概括地看不见的情况。此外，类别级别的姿势估计需要一种能够在测试时间概括到看不见的对象的方法，这也很具有挑战性。在本文中，我们从传统的点对特征（PPF）中汲取灵感，我们设计了一种新颖的类别级PPF（CPPF）投票方法，以实现野外准确，健壮和可概括的9D姿势估计。为了获得稳健的姿势估计，我们在对象上采样了许多点对，对于每个模型，我们的模型都会预测对象中心，方向和尺度的必要SE（3） - 不变的投票统计。提出了一种新颖的粗到精细投票算法，以消除嘈杂的点对样本并从人群中产生最终预测。为了摆脱方向投票过程中的假阳性，为每个采样点对引入了辅助二进制二进制二进制分类任务。为了检测野外的物体，我们仅在合成点云上训练仔细地设计了SIM到现实的管道，除非对象在几何形状中具有模棱两可的姿势。在这种情况下，要利用颜色信息来消除这些姿势。标准基准测试的结果表明，我们的方法与现实世界培训数据的当前艺术状态相当。广泛的实验进一步表明，在极具挑战性的情况下，我们的方法对噪声具有鲁棒性，并给出了令人鼓舞的结果。我们的代码可在https://github.com/qq456cvb/cppf上找到。

In this paper, we tackle the problem of category-level 9D pose estimation in the wild, given a single RGB-D frame. Using supervised data of real-world 9D poses is tedious and erroneous, and also fails to generalize to unseen scenarios. Besides, category-level pose estimation requires a method to be able to generalize to unseen objects at test time, which is also challenging. Drawing inspirations from traditional point pair features (PPFs), in this paper, we design a novel Category-level PPF (CPPF) voting method to achieve accurate, robust and generalizable 9D pose estimation in the wild. To obtain robust pose estimation, we sample numerous point pairs on an object, and for each pair our model predicts necessary SE(3)-invariant voting statistics on object centers, orientations and scales. A novel coarse-to-fine voting algorithm is proposed to eliminate noisy point pair samples and generate final predictions from the population. To get rid of false positives in the orientation voting process, an auxiliary binary disambiguating classification task is introduced for each sampled point pair. In order to detect objects in the wild, we carefully design our sim-to-real pipeline by training on synthetic point clouds only, unless objects have ambiguous poses in geometry. Under this circumstance, color information is leveraged to disambiguate these poses. Results on standard benchmarks show that our method is on par with current state of the arts with real-world training data. Extensive experiments further show that our method is robust to noise and gives promising results under extremely challenging scenarios. Our code is available on https://github.com/qq456cvb/CPPF.

下载PDF全文

下载文献需遵守相关版权规定

论文标题