论文标题
直接致密姿势估计
Direct Dense Pose Estimation
论文作者
论文摘要
密集的人姿势估计是RGB图像与人体表面之间学习密集的对应关系的问题,这些问题发现了各种应用,例如人体重建,人姿势转移和人类的行动识别。先前的密集姿势估计方法均基于蒙版R-CNN框架,并以自上而下的方式进行操作,以首先尝试为每个人识别一个边界框并在每个边界框中匹配密集的对应关系。因此,这些方法由于对掩模R-CNN检测的批判性依赖而缺乏鲁棒性,并且随着图像中的人数的增加,运行时会大大增加。因此,我们提出了一种解决密集姿势估计问题的新型替代方法,称为直接密集姿势(DDP)。 DDP首先预测实例掩码和全局IUV表示形式,然后将它们结合在一起。我们还提出了一个简单而有效的2D时间平滑计划,以减轻处理视频数据时的暂时烦恼。实验表明,DDP克服了先前自上而下的基线方法的局限性,并达到了竞争精度。此外,DDP在计算上比以前的密集姿势估计方法更有效,并且当应用于视频序列时,它会减少烦恼,这是困扰先前方法的问题。
Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies, which finds various applications, such as human body reconstruction, human pose transfer, and human action recognition. Prior dense pose estimation methods are all based on Mask R-CNN framework and operate in a top-down manner of first attempting to identify a bounding box for each person and matching dense correspondences in each bounding box. Consequently, these methods lack robustness due to their critical dependence on the Mask R-CNN detection, and the runtime increases drastically as the number of persons in the image increases. We therefore propose a novel alternative method for solving the dense pose estimation problem, called Direct Dense Pose (DDP). DDP first predicts the instance mask and global IUV representation separately and then combines them together. We also propose a simple yet effective 2D temporal-smoothing scheme to alleviate the temporal jitters when dealing with video data. Experiments demonstrate that DDP overcomes the limitations of previous top-down baseline methods and achieves competitive accuracy. In addition, DDP is computationally more efficient than previous dense pose estimation methods, and it reduces jitters when applied to a video sequence, which is a problem plaguing the previous methods.