3D对象理解的单发神经领域

论文标题

3D对象理解的单发神经领域

One-Shot Neural Fields for 3D Object Understanding

论文作者

Blukis, Valts, Lee, Taeyeop, Tremblay, Jonathan, Wen, Bowen, Kweon, In So, Yoon, Kuk-Jin, Fox, Dieter, Birchfield, Stan

论文摘要

我们提出了机器人技术的统一而紧凑的场景表示，其中场景中的每个对象都由捕获几何形状和外观的潜在代码描绘。可以为各种任务解码此表示形式，例如新型视图渲染，3D重建（例如恢复深度，点云或体素图），碰撞检查和稳定的掌握预测。我们通过利用神经辐射场（NERF）的最新进展来从单个RGB输入图像中构建表示形式，这些进步在大型多视图数据集中学习类别级别的先验，然后从一个或几个视图中对新颖对象进行微调。我们将NERF模型扩展以进行其他掌握输出，并探索利用此表示为机器人技术的方法。在测试时间，我们从单个RGB输入图像构建表示形式，该图像仅从一个角度观察场景。我们发现，恢复的表示允许从新颖的视图（包括遮挡的对象部分）呈现，以及预测成功的稳定掌握。可以用隐式抓地力解码器直接从我们的潜在表示中解码抓紧姿势。我们在模拟和现实世界中进行了实验，并证明了使用这种紧凑的表示的能力。网站：https：//nerfgrasp.github.io

We present a unified and compact scene representation for robotics, where each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction (e.g. recovering depth, point clouds, or voxel maps), collision checking, and stable grasp prediction. We build our representation from a single RGB input image at test time by leveraging recent advances in Neural Radiance Fields (NeRF) that learn category-level priors on large multiview datasets, then fine-tune on novel objects from one or few views. We expand the NeRF model for additional grasp outputs and explore ways to leverage this representation for robotics. At test-time, we build the representation from a single RGB input image observing the scene from only one viewpoint. We find that the recovered representation allows rendering from novel views, including of occluded object parts, and also for predicting successful stable grasps. Grasp poses can be directly decoded from our latent representation with an implicit grasp decoder. We experimented in both simulation and real world and demonstrated the capability for robust robotic grasping using such compact representation. Website: https://nerfgrasp.github.io

下载PDF全文

下载文献需遵守相关版权规定

论文标题