多人3D人姿势估计的明确闭塞推理

论文标题

多人3D人姿势估计的明确闭塞推理

Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation

论文作者

Liu, Qihao, Zhang, Yi, Bai, Song, Yuille, Alan

论文摘要

闭塞对单眼多人3D人姿势估计构成了巨大威胁，这是由于封闭器的形状，外观和位置差异很大。尽管现有的方法试图用姿势先验/约束，数据增强或隐性推理处理闭塞，但它们仍然无法概括地看不见的姿势或遮挡案例，并且在出现多个人时可能会犯大错误。受到人类从可见线索推断遮挡关节的显着能力的启发，我们开发了一种方法来显式建模该过程，该过程可显着改善有或没有遮挡的情况下，可以显着改善自下而上的多人姿势估计。首先，我们将任务分为两个子任务：可见的关键点检测和遮挡的关键点推理，并提出了深入监督的编码器蒸馏（DSED）网络以求解第二个网络。为了训练我们的模型，我们提出了一种骨架引导的人形拟合（SSF）方法，以在现有数据集上生成伪遮挡标签，从而实现明确的闭塞推理。实验表明，从遮挡中明确学习可以改善人类姿势估计。此外，利用可见关节的特征级信息使我们可以更准确地推理遮挡关节。我们的方法的表现优于几个基准的最新自上而下和自下而上的方法。

Occlusion poses a great threat to monocular multi-person 3D human pose estimation due to large variability in terms of the shape, appearance, and position of occluders. While existing methods try to handle occlusion with pose priors/constraints, data augmentation, or implicit reasoning, they still fail to generalize to unseen poses or occlusion cases and may make large mistakes when multiple people are present. Inspired by the remarkable ability of humans to infer occluded joints from visible cues, we develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation with or without occlusions. First, we split the task into two subtasks: visible keypoints detection and occluded keypoints reasoning, and propose a Deeply Supervised Encoder Distillation (DSED) network to solve the second one. To train our model, we propose a Skeleton-guided human Shape Fitting (SSF) approach to generate pseudo occlusion labels on the existing datasets, enabling explicit occlusion reasoning. Experiments show that explicitly learning from occlusions improves human pose estimation. In addition, exploiting feature-level information of visible joints allows us to reason about occluded joints more accurately. Our method outperforms both the state-of-the-art top-down and bottom-up methods on several benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题