VirtualPose：从虚拟数据中学习可通用的3D人类姿势模型

论文标题

VirtualPose：从虚拟数据中学习可通用的3D人类姿势模型

VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual Data

论文作者

Su, Jiajun, Wang, Chunyu, Ma, Xiaoxuan, Zeng, Wenjun, Wang, Yizhou

论文摘要

尽管单眼3D姿势估计似乎在公共数据集上取得了非常准确的结果，但它们的概括能力在很大程度上被忽略了。在这项工作中，我们对现有方法进行系统评估，并发现在对不同的摄像机，人体姿势和外观进行测试时，它们会出现更大的错误。为了解决这个问题，我们介绍了VirtualPose，这是一个两阶段的学习框架，以利用该任务特定的隐藏的“免费午餐”，即免费生成无限数量的姿势和摄像头，以免费提供培训模型。为此，第一阶段将图像转换为抽象的几何表示（AGR），然后第二阶段将它们映射到3D姿势。它从两个方面解决了概括问题：（1）可以在不同的2D数据集上对第一阶段进行培训，以降低过度拟合到有限外观的风险；（2）第二阶段可以接受从大量虚拟摄像机和姿势合成的不同AGR训练。它在不使用任何配对的图像和3D姿势的情况下优于SOTA方法，这为实际应用铺平了道路。代码可从https://github.com/wkom/virtualpose获得。

While monocular 3D pose estimation seems to have achieved very accurate results on the public datasets, their generalization ability is largely overlooked. In this work, we perform a systematic evaluation of the existing methods and find that they get notably larger errors when tested on different cameras, human poses and appearance. To address the problem, we introduce VirtualPose, a two-stage learning framework to exploit the hidden "free lunch" specific to this task, i.e. generating infinite number of poses and cameras for training models at no cost. To that end, the first stage transforms images to abstract geometry representations (AGR), and then the second maps them to 3D poses. It addresses the generalization issue from two aspects: (1) the first stage can be trained on diverse 2D datasets to reduce the risk of over-fitting to limited appearance; (2) the second stage can be trained on diverse AGR synthesized from a large number of virtual cameras and poses. It outperforms the SOTA methods without using any paired images and 3D poses from the benchmarks, which paves the way for practical applications. Code is available at https://github.com/wkom/VirtualPose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题