论文标题
Physcap:实时物理上合理的单眼3D运动捕获
PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time
论文作者
论文摘要
单彩相机的无标记3D人类运动捕获取得了重大进展。但是,这是一个非常具有挑战性且严重的问题。因此,即使是最准确的最新方法也有重大局限性。与基于多视图或基于标记的运动捕获相比,纯粹基于单个关节或骨骼的运动制剂,以及最新方法中频繁的框架重建频繁的重建。此外,被捕获的3D姿势通常在物理上不正确且在生物力学上令人难以置信,或表现出令人难以置信的环境相互作用(地板穿透,脚步,不自然的身体倾斜和深度转移),这对于计算机图形中的任何使用情况都是有问题的。因此,我们呈现PhysCap,这是第一种用于物理上合理,实时和无标记的人类3D运动捕获的算法,该算法在25 fps处使用单色相机。我们的算法首先捕获了3D人类构想的纯粹运动。为此,CNN侵入2D和3D关节位置,随后,逆运动学步骤找到了时空连贯的关节角度和全局3D姿势。接下来,这些运动学重建被用作基于实时物理学的姿势优化器中的约束,该姿势优化器解释了环境限制(例如,碰撞处理和地板放置),重力和人类姿势的生物物理合理性。我们的方法采用了地面反作用力和残留力来实现合理的根控制,并使用训练有素的神经网络来检测图像中的脚接触事件。我们的方法从实时和一般场景中捕获了物理上合理且在时间上稳定的全球3D人类运动,而没有物理上令人难以置信的姿势,地板穿透或脚溜冰。该视频可在http://gvv.mpi-inf.mpg.de/projects/physcap上获得
Marker-less 3D human motion capture from a single colour camera has seen significant progress. However, it is a very challenging and severely ill-posed problem. In consequence, even the most accurate state-of-the-art approaches have significant limitations. Purely kinematic formulations on the basis of individual joints or skeletons, and the frequent frame-wise reconstruction in state-of-the-art methods greatly limit 3D accuracy and temporal stability compared to multi-view or marker-based motion capture. Further, captured 3D poses are often physically incorrect and biomechanically implausible, or exhibit implausible environment interactions (floor penetration, foot skating, unnatural body leaning and strong shifting in depth), which is problematic for any use case in computer graphics. We, therefore, present PhysCap, the first algorithm for physically plausible, real-time and marker-less human 3D motion capture with a single colour camera at 25 fps. Our algorithm first captures 3D human poses purely kinematically. To this end, a CNN infers 2D and 3D joint positions, and subsequently, an inverse kinematics step finds space-time coherent joint angles and global 3D pose. Next, these kinematic reconstructions are used as constraints in a real-time physics-based pose optimiser that accounts for environment constraints (e.g., collision handling and floor placement), gravity, and biophysical plausibility of human postures. Our approach employs a combination of ground reaction force and residual force for plausible root control, and uses a trained neural network to detect foot contact events in images. Our method captures physically plausible and temporally stable global 3D human motion, without physically implausible postures, floor penetrations or foot skating, from video in real time and in general scenes. The video is available at http://gvv.mpi-inf.mpg.de/projects/PhysCap