视觉识别铰接对象部分

论文标题

视觉识别铰接对象部分

Visual Identification of Articulated Object Parts

论文作者

Zeng, Vicky, Lee, Tabitha Edith, Liang, Jacky, Kroemer, Oliver

论文摘要

随着自主机器人在现实世界环境（例如房屋）互动并导航，可靠地识别和操纵铰接式物体（例如门和机柜）很有用。许多先前在物体表达识别中的作品需要机器人或人类对物体操纵。尽管最近的工作已经解决了仅凭视觉观测来预测的表达类型，但他们经常假设类别级运动运动模型或观察序列的先验知识，在这些观察结果中，铰接的部分根据其运动学约束而移动。在这项工作中，我们提出了FormNet，这是一个神经网络，可以识别从RGB-D图像和分割掩码的单个帧对对象部分之间的发音机制。该网络对来自6个类别的149个铰接对象的100K合成图像进行了训练。合成图像是通过具有域随机化的感性模拟器渲染的。我们提出的模型预测了对象部分的运动残留流，这些流量用于确定关节类型和参数。该网络在训练有素的类别中的新对象实例上实现了82.5％的发音类型分类精度。实验还显示了该方法如何使概括为新的类别并应用于无需微调的现实世界图像。

As autonomous robots interact and navigate around real-world environments such as homes, it is useful to reliably identify and manipulate articulated objects, such as doors and cabinets. Many prior works in object articulation identification require manipulation of the object, either by the robot or a human. While recent works have addressed predicting articulation types from visual observations alone, they often assume prior knowledge of category-level kinematic motion models or sequence of observations where the articulated parts are moving according to their kinematic constraints. In this work, we propose FormNet, a neural network that identifies the articulation mechanisms between pairs of object parts from a single frame of an RGB-D image and segmentation masks. The network is trained on 100k synthetic images of 149 articulated objects from 6 categories. Synthetic images are rendered via a photorealistic simulator with domain randomization. Our proposed model predicts motion residual flows of object parts, and these flows are used to determine the articulation type and parameters. The network achieves an articulation type classification accuracy of 82.5% on novel object instances in trained categories. Experiments also show how this method enables generalization to novel categories and be applied to real-world images without fine-tuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题