论文标题
RIN:带有单个图像的质感人类模型恢复和模仿
RIN: Textured Human Model Recovery and Imitation with a Single Image
论文作者
论文摘要
人类的模仿最近已成为甘恩(Gan)脱离人体姿势和身体含量的能力的驱动。但是,最新方法几乎不关注3D信息,并且为避免自我概括,需要大量的输入图像。在本文中,我们提出了RIN,这是一种基于卷的新型框架,用于从单个图片中重建纹理的3D模型,并模仿具有生成模型的主题。具体来说,为了估算大多数人类质地,我们提出了一个类似U-NET的前后翻译网络。借助前后图像输入,纹理的音量恢复模块使我们能够为体积的人类着色。然后,3D姿势的序列通过可流动的删除网络将彩色音量引导为体积到体积的翻译任务。为了在训练期间投射到2D飞机上,我们设计了一个可微不足道的渲染器。我们的实验表明,基于体积的模型足以适应人类的模仿,并且可以使用我们的网络可靠地估计背景。尽管基于2D姿势或语义图的先前作品通常会因人的不稳定外观而失败,但我们的框架仍然可以产生具体的结果,这与从多视图输入中想象的那些竞争性竞争。
Human imitation has become topical recently, driven by GAN's ability to disentangle human pose and body content. However, the latest methods hardly focus on 3D information, and to avoid self-occlusion, a massive amount of input images are needed. In this paper, we propose RIN, a novel volume-based framework for reconstructing a textured 3D model from a single picture and imitating a subject with the generated model. Specifically, to estimate most of the human texture, we propose a U-Net-like front-to-back translation network. With both front and back images input, the textured volume recovery module allows us to color a volumetric human. A sequence of 3D poses then guides the colored volume via Flowable Disentangle Networks as a volume-to-volume translation task. To project volumes to a 2D plane during training, we design a differentiable depth-aware renderer. Our experiments demonstrate that our volume-based model is adequate for human imitation, and the back view can be estimated reliably using our network. While prior works based on either 2D pose or semantic map often fail for the unstable appearance of a human, our framework can still produce concrete results, which are competitive to those imagined from multi-view input.