你手中有什么？ 3D重建通用对象

论文标题

你手中有什么？ 3D重建通用对象

What's in your hands? 3D Reconstruction of Generic Objects in Hands

论文作者

Ye, Yufei, Gupta, Abhinav, Tulsiani, Shubham

论文摘要

我们的工作旨在重建单个RGB图像的手持对象。与通常假设已知的3D模板并将问题减少到3D姿势估计的先前作品相反，我们的工作在不知道其3D模板的情况下重建通用的手持对象。我们的关键见解是，手关节高度预测了对象形状，我们提出了一种根据关节和视觉输入有条件地重建对象的方法。给定描绘手持物体的图像，我们首先使用现成的系统来估计基础的手姿势，然后在标准化的以手动坐标框架中推断对象形状。我们通过签名距离参数化对象，该对象由隐式网络推断，该网络利用视觉功能和关注感知坐标的信息来处理查询点。我们在三个数据集上执行实验，并表明我们的方法始终优于基准，并能够重建一组各种对象。我们分析了显式发音调节的益处和鲁棒性，还表明这使手姿势估计得以进一步改善测试时间优化。

Our work aims to reconstruct hand-held objects given a single RGB image. In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates. Our key insight is that hand articulation is highly predictive of the object shape, and we propose an approach that conditionally reconstructs the object based on the articulation and the visual input. Given an image depicting a hand-held object, we first use off-the-shelf systems to estimate the underlying hand pose and then infer the object shape in a normalized hand-centric coordinate frame. We parameterized the object by signed distance which are inferred by an implicit network which leverages the information from both visual feature and articulation-aware coordinates to process a query point. We perform experiments across three datasets and show that our method consistently outperforms baselines and is able to reconstruct a diverse set of objects. We analyze the benefits and robustness of explicit articulation conditioning and also show that this allows the hand pose estimation to further improve in test-time optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题