论文标题
解剖学感知3D人姿势估计与骨基姿势分解
Anatomy-aware 3D Human Pose Estimation with Bone-based Pose Decomposition
论文作者
论文摘要
在这项工作中,我们为视频中的3D人姿势估计提出了一种新解决方案。我们没有直接回归3D关节位置,而是从人体骨骼解剖结构中汲取灵感,并将任务分解为骨骼方向预测和骨长度预测,从中可以完全得出3D关节位置。我们的动机是,人类骨骼的骨骼长度在整个时间内保持一致。这促进了我们开发有效的技术来利用视频中所有框架中的全球信息,以进行高准确的骨长度预测。此外,对于骨骼方向预测网络,我们提出了一个具有长跳连接的完全横向逆转繁殖体系结构。从本质上讲,它在不使用任何耗时的内存单元(例如LSTM)。进一步引入了新的关节移位损失,以弥合骨长和骨方向预测网络的训练。最后,我们采用一种隐式注意机制作为额外的指导将2D KePoint可见性评分馈入模型,从而大大减轻许多具有挑战性的姿势的深度歧义。我们的完整模型的表现优于以前的360万和MPI-INF-INF-3DHP数据集的最佳结果,其中全面的评估验证了我们的模型的有效性。
In this work, we propose a new solution to 3D human pose estimation in videos. Instead of directly regressing the 3D joint locations, we draw inspiration from the human skeleton anatomy and decompose the task into bone direction prediction and bone length prediction, from which the 3D joint locations can be completely derived. Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time. This promotes us to develop effective techniques to utilize global information across all the frames in a video for high-accuracy bone length prediction. Moreover, for the bone direction prediction network, we propose a fully-convolutional propagating architecture with long skip connections. Essentially, it predicts the directions of different bones hierarchically without using any time-consuming memory units e.g. LSTM). A novel joint shift loss is further introduced to bridge the training of the bone length and bone direction prediction networks. Finally, we employ an implicit attention mechanism to feed the 2D keypoint visibility scores into the model as extra guidance, which significantly mitigates the depth ambiguity in many challenging poses. Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets, where comprehensive evaluation validates the effectiveness of our model.