Lite姿势：2D人姿势估计的有效体系结构设计

论文标题

Lite姿势：2D人姿势估计的有效体系结构设计

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

论文作者

Wang, Yihan, Li, Muyang, Cai, Han, Chen, Wei-Ming, Han, Song

论文摘要

姿势估计在以人为中心的视力应用中起关键作用。但是，由于高计算成本（每帧超过150 GMAC），因此很难在资源受限的边缘设备上部署最新的基于HRNET的姿势估计模型。在本文中，我们研究了在边缘实时多人姿势估计的有效体系结构设计。我们透露，通过我们的逐渐收缩实验，HRNET的高分辨率分支对于低兼容区域的模型是多余的。删除它们可以提高效率和性能。受这一发现的启发，我们设计了LitePose，这是一种有效的单分支架构，用于姿势估计，并引入了两种简单的方法来增强LitePose的能力，包括Fusion Deconv Head和大型内核Corvs。 Fusion Deconv头部删除了高分辨率分支中的冗余，从而使尺度感知的特征融合且开销低。大型内核会大大提高模型的能力和接受场，同时保持低计算成本。只有25％的计算增量，7x7内核的实现+14.0地图比人群数据集上的3x3内核要好得多。在移动平台上，LitePose与先前最新的有效姿势估计模型相比，在不牺牲性能的情况下，LitePose最多可将潜伏期降低到5.0倍，从而推动了实时多人姿势估计的边界。我们的代码和预训练模型将在https://github.com/mit-han-lab/litepose上发布。

Pose estimation plays a critical role in human-centered vision applications. However, it is difficult to deploy state-of-the-art HRNet-based pose estimation models on resource-constrained edge devices due to the high computational cost (more than 150 GMACs per frame). In this paper, we study efficient architecture design for real-time multi-person pose estimation on edge. We reveal that HRNet's high-resolution branches are redundant for models at the low-computation region via our gradual shrinking experiments. Removing them improves both efficiency and performance. Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation, and introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs. Fusion Deconv Head removes the redundancy in high-resolution branches, allowing scale-aware feature fusion with low overhead. Large Kernel Convs significantly improve the model's capacity and receptive field while maintaining a low computational cost. With only 25% computation increment, 7x7 kernels achieve +14.0 mAP better than 3x3 kernels on the CrowdPose dataset. On mobile platforms, LitePose reduces the latency by up to 5.0x without sacrificing performance, compared with prior state-of-the-art efficient pose estimation models, pushing the frontier of real-time multi-person pose estimation on edge. Our code and pre-trained models are released at https://github.com/mit-han-lab/litepose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题