遮挡的人网恢复

论文标题

遮挡的人网恢复

Occluded Human Mesh Recovery

论文作者

Khirodkar, Rawal, Tripathi, Shashank, Kitani, Kris

论文摘要

自上而下的单眼人网恢复方法有两个阶段：（1）检测人的边界框；（2）将每个边界框视为独立的单人网格恢复任务。不幸的是，单人类假设在具有多人类阻塞和拥挤的图像中不存在。因此，自上而下的方法在严重的人人闭塞下恢复准确的3D人网络遇到困难。为了解决这个问题，我们提出了遮挡的人网状恢复（OCHMR） - 一种新型的自上而下的网格恢复方法，结合了图像空间上下文以克服单人类假设的局限性。该方法在概念上很简单，可以应用于任何现有的自上而下体系结构。与输入图像一起，我们以车身中心热图的形式从图像上从图像上调节自上而下的模型。为了从预测的身体中心板上进行理解，我们引入了上下文归一化（conorm）块以适应自上而下的模型的中间特征。上下文调节有助于我们的模型在两个严重重叠的人体边界盒之间歧义歧义，从而使多人遮挡变得强大。与最先进的方法相比，OCHMR在挑战性的多人基准（例如3DPW，CrowdPose和Ochuman）上取得了出色的性能。具体而言，我们提出的上下文推理体系结构应用于带有Resnet-50骨架的旋转模型，在3DPW-PC上占75.2 PMPJPE，在CrowdPose上的23.6 AP和Ochuman数据集的37.7 AP，在基线上分别在6.9 mm，6.4 AP和20.8 AP上的显着改善。代码和模型将发布。

Top-down methods for monocular human mesh recovery have two stages: (1) detect human bounding boxes; (2) treat each bounding box as an independent single-human mesh recovery task. Unfortunately, the single-human assumption does not hold in images with multi-human occlusion and crowding. Consequently, top-down methods have difficulties in recovering accurate 3D human meshes under severe person-person occlusion. To address this, we present Occluded Human Mesh Recovery (OCHMR) - a novel top-down mesh recovery approach that incorporates image spatial context to overcome the limitations of the single-human assumption. The approach is conceptually simple and can be applied to any existing top-down architecture. Along with the input image, we condition the top-down model on spatial context from the image in the form of body-center heatmaps. To reason from the predicted body centermaps, we introduce Contextual Normalization (CoNorm) blocks to adaptively modulate intermediate features of the top-down model. The contextual conditioning helps our model disambiguate between two severely overlapping human bounding-boxes, making it robust to multi-person occlusion. Compared with state-of-the-art methods, OCHMR achieves superior performance on challenging multi-person benchmarks like 3DPW, CrowdPose and OCHuman. Specifically, our proposed contextual reasoning architecture applied to the SPIN model with ResNet-50 backbone results in 75.2 PMPJPE on 3DPW-PC, 23.6 AP on CrowdPose and 37.7 AP on OCHuman datasets, a significant improvement of 6.9 mm, 6.4 AP and 20.8 AP respectively over the baseline. Code and models will be released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题