通过学习人类场景互动来填充3D场景

论文标题

通过学习人类场景互动来填充3D场景

Populating 3D Scenes by Learning Human-Scene Interaction

论文作者

Hassan, Mohamed, Ghosh, Partha, Tesch, Joachim, Tzionas, Dimitrios, Black, Michael J.

论文摘要

人类生活在3D空间内，并不断与之互动以执行任务。这种相互作用涉及在语义上有意义的表面之间的物理接触。我们的目标是了解人类如何与场景互动并利用这一点，以使虚拟字符能够执行相同的操作。为此，我们介绍了一种新型的人类习惯相互作用（HSI）模型，该模型编码近端关系，称为POSA，称为“具有邻接和接触的姿势”。相互作用的表示是以身体为中心，它使其能够概括为新场景。具体而言，POSA增加了SMPL-X参数人体模型，以使其每个网格顶点对（a）与场景表面的触点概率进行编码，以及（b）相应的语义场景标签。我们以SMPL-X顶点的VAE学习POSA，并在Prox数据集中训练POSA，其中包含与3D场景交互的人的SMPL-X网格，以及来自Prox-E数据集的相应场景语义。我们通过两个应用程序证明了POSA的价值。首先，我们会在场景中自动将3D扫描放置。我们将SMPL-X模型拟合到扫描中作为代理，然后在3D中找到其最有可能的位置。 POSA提供了有效的表示形式，可以在场景中搜索与该姿势的接触关系相匹配的“负担能力”。我们进行了一项感知研究，该研究在此任务上显示出对最新技术的显着改善。其次，我们表明POSA学到的身体场景相互作用的表示支持了与3D场景一致的单眼人类姿势估计，从而改善了最新技术的状态。我们的模型和代码可用于研究目的，网址为https://posa.is.tue.mpg.de。

Humans live within a 3D space and constantly interact with it to perform tasks. Such interactions involve physical contact between surfaces that is semantically meaningful. Our goal is to learn how humans interact with scenes and leverage this to enable virtual characters to do the same. To that end, we introduce a novel Human-Scene Interaction (HSI) model that encodes proximal relationships, called POSA for "Pose with prOximitieS and contActs". The representation of interaction is body-centric, which enables it to generalize to new scenes. Specifically, POSA augments the SMPL-X parametric human body model such that, for every mesh vertex, it encodes (a) the contact probability with the scene surface and (b) the corresponding semantic scene label. We learn POSA with a VAE conditioned on the SMPL-X vertices, and train on the PROX dataset, which contains SMPL-X meshes of people interacting with 3D scenes, and the corresponding scene semantics from the PROX-E dataset. We demonstrate the value of POSA with two applications. First, we automatically place 3D scans of people in scenes. We use a SMPL-X model fit to the scan as a proxy and then find its most likely placement in 3D. POSA provides an effective representation to search for "affordances" in the scene that match the likely contact relationships for that pose. We perform a perceptual study that shows significant improvement over the state of the art on this task. Second, we show that POSA's learned representation of body-scene interaction supports monocular human pose estimation that is consistent with a 3D scene, improving on the state of the art. Our model and code are available for research purposes at https://posa.is.tue.mpg.de.

下载PDF全文

下载文献需遵守相关版权规定

论文标题