论文标题
Posebert:用于时间3D人类建模的通用变压器模块
PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling
论文作者
论文摘要
培训视频中人类姿势估算的最先进模型需要具有很难获得的注释的数据集。尽管最近已将变压器用于人体姿势序列建模,但相关方法依赖于伪地真相来增强目前有限的培训数据可用于学习此类模型。在本文中,我们介绍了Posebert,Posebert是一个通过掩盖建模对3D运动捕获(MOCAP)数据进行全面训练的变压器模块。它是简单,通用和通用的,因为它可以插入任何基于图像的模型的顶部,以在基于视频的模型中使用时间信息。我们展示了Posebert的变体,不同的输入从3D骨骼关键点到全身(SMPL)或仅手(Mano)的3D参数模型的旋转。由于Posebert培训是任务不可知的,因此该模型可以应用于诸如姿势改进,未来的姿势预测或运动完成等几个任务。我们的实验结果证明,在各种最新的姿势估计方法之上添加Posebert始终提高其性能,而其低计算成本使我们能够在实时演示中使用它,以通过网络摄像头平稳地为机器人手动画。可以在https://github.com/naver/posebert上获得测试代码和模型。
Training state-of-the-art models for human pose estimation in videos requires datasets with annotations that are really hard and expensive to obtain. Although transformers have been recently utilized for body pose sequence modeling, related methods rely on pseudo-ground truth to augment the currently limited training data available for learning such models. In this paper, we introduce PoseBERT, a transformer module that is fully trained on 3D Motion Capture (MoCap) data via masked modeling. It is simple, generic and versatile, as it can be plugged on top of any image-based model to transform it in a video-based model leveraging temporal information. We showcase variants of PoseBERT with different inputs varying from 3D skeleton keypoints to rotations of a 3D parametric model for either the full body (SMPL) or just the hands (MANO). Since PoseBERT training is task agnostic, the model can be applied to several tasks such as pose refinement, future pose prediction or motion completion without finetuning. Our experimental results validate that adding PoseBERT on top of various state-of-the-art pose estimation methods consistently improves their performances, while its low computational cost allows us to use it in a real-time demo for smoothly animating a robotic hand via a webcam. Test code and models are available at https://github.com/naver/posebert.