论文标题
通过透视知识进行3D人类姿势估计的弱监督预训练的预训练
Weakly-supervised Pre-training for 3D Human Pose Estimation via Perspective Knowledge
论文作者
论文摘要
现代深度学习的3D姿势估计方法需要大量的3D姿势注释。但是,现有的3D数据集缺乏多样性,这限制了当前方法的性能及其概括能力。尽管现有方法利用2D姿势注释来帮助3D姿势估计,但它们主要集中于从2D姿势中提取2D结构约束,而忽略了隐藏在图像中的3D信息。在本文中,我们提出了一种新的方法,可以直接从2D图像中提取弱3D信息,而无需3D姿势监督。首先,我们利用2D姿势注释和透视先验知识来生成该关键点的关系,它离相机更近或更远,称为相对深度。我们收集一个2D姿势数据集(MCPC)并生成相对深度标签。基于MCPC,我们提出了一种弱监督的预训练(WSP)策略,以区分图像中两个点之间的深度关系。 WSP可以在许多野外图像上学习两个关键点的相对深度,这更有能力预测3D人姿势估计的深度和概括能力。在对3D姿势数据集进行了微调后,WSP在两个广泛使用的基准测试中实现了最新的结果。
Modern deep learning-based 3D pose estimation approaches require plenty of 3D pose annotations. However, existing 3D datasets lack diversity, which limits the performance of current methods and their generalization ability. Although existing methods utilize 2D pose annotations to help 3D pose estimation, they mainly focus on extracting 2D structural constraints from 2D poses, ignoring the 3D information hidden in the images. In this paper, we propose a novel method to extract weak 3D information directly from 2D images without 3D pose supervision. Firstly, we utilize 2D pose annotations and perspective prior knowledge to generate the relationship of that keypoint is closer or farther from the camera, called relative depth. We collect a 2D pose dataset (MCPC) and generate relative depth labels. Based on MCPC, we propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image. WSP enables the learning of the relative depth of two keypoints on lots of in-the-wild images, which is more capable of predicting depth and generalization ability for 3D human pose estimation. After fine-tuning on 3D pose datasets, WSP achieves state-of-the-art results on two widely-used benchmarks.