在拥挤的场景视频中进行准确的人类姿势估计

论文标题

在拥挤的场景视频中进行准确的人类姿势估计

Towards Accurate Human Pose Estimation in Videos of Crowded Scenes

论文作者

Yuan, Li, Chang, Shuning, Nie, Xuecheng, Huang, Ziyuan, Zhou, Yichen, Chen, Yunpeng, Feng, Jiashi, Yan, Shuicheng

论文摘要

基于视频的人类姿势估计在拥挤的场景中是一个充满挑战的问题，这是由于阻塞，运动模糊，比例变化和观点变化等。先前的方法总是无法处理此问题，因为（1）缺乏时间信息的使用；（2）在拥挤的场景中缺乏培训数据。在本文中，我们专注于从剥削时间上下文和收集新数据的角度来改善人类姿势的估计。特别是，我们首先遵循自上而下的策略，以检测人员并对每个帧进行单人姿势估计。然后，我们通过源自光流的时间上下文来完善基于框架的姿势估计。具体而言，对于一个框架，我们将历史姿势从以前的框架中转发出来，并向后构成的未来从随后的框架到当前框架，从而导致视频中稳定且准确的人类姿势估计。此外，我们从Internet挖掘了与HIE数据集相似的新数据，以改善培训集的多样性。这样，我们的模型在13个视频中的7个和56.33的平均w \ _AP中实现了最佳性能。

Video-based human pose estimation in crowded scenes is a challenging problem due to occlusion, motion blur, scale variation and viewpoint change, etc. Prior approaches always fail to deal with this problem because of (1) lacking of usage of temporal information; (2) lacking of training data in crowded scenes. In this paper, we focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data. In particular, we first follow the top-down strategy to detect persons and perform single-person pose estimation for each frame. Then, we refine the frame-based pose estimation with temporal contexts deriving from the optical-flow. Specifically, for one frame, we forward the historical poses from the previous frames and backward the future poses from the subsequent frames to current frame, leading to stable and accurate human pose estimation in videos. In addition, we mine new data of similar scenes to HIE dataset from the Internet for improving the diversity of training set. In this way, our model achieves best performance on 7 out of 13 videos and 56.33 average w\_AP on test dataset of HIE challenge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题