论文标题
减轻人类级别的转移:多人姿势估计的强大域适应方法
Alleviating Human-level Shift : A Robust Domain Adaptation Method for Multi-person Pose Estimation
论文作者
论文摘要
人们对人姿势估计进行了广泛的研究,重点放在需要足够注释的监督学习上。但是,在实际应用中,经过验证的姿势估计模型通常需要适应没有标签或稀疏标签的新域。尚未探讨此类域姿势估计的适应性。主要原因是姿势本质上具有典型的拓扑结构,并且需要局部关键的细粒度特征。虽然现有的适应方法不考虑利益对象的拓扑结构,并且它们将整个图像调整为粗略。因此,我们提出了一种新型的域适应方法,用于进行多人姿势估计,以进行人类水平的拓扑结构对准和细粒的特征比对。我们的方法由三个模块组成:跨集体特征比对(CAFA),内域结构适应(ISA)和域间的人类 - 人物对齐(IHTA)模块。 CAFA采用双向空间注意模块(BSAM),该模块的重点是两个人之间的细粒度局部特征相关性,以适应适应的一致特征以进行适应。我们仅在半监督域的适应性(SSDA)中采用ISA来利用相应的关键点语义关系来减少内域偏置。最重要的是,我们提出一个IHTA,以学习更多的域,不变的人类拓扑表示,以减少域间差异。我们通过传递可以考虑高阶关系的消息来通过图形卷积网络(GCN)对人类拓扑结构进行建模。这种基于GCN的一体对准的结构对被阻塞或极端姿势推断有益。广泛的实验是在两个流行的基准上进行的,结果证明了我们方法与现有监督方法相比的能力。
Human pose estimation has been widely studied with much focus on supervised learning requiring sufficient annotations. However, in real applications, a pretrained pose estimation model usually need be adapted to a novel domain with no labels or sparse labels. Such domain adaptation for 2D pose estimation hasn't been explored. The main reason is that a pose, by nature, has typical topological structure and needs fine-grained features in local keypoints. While existing adaptation methods do not consider topological structure of object-of-interest and they align the whole images coarsely. Therefore, we propose a novel domain adaptation method for multi-person pose estimation to conduct the human-level topological structure alignment and fine-grained feature alignment. Our method consists of three modules: Cross-Attentive Feature Alignment (CAFA), Intra-domain Structure Adaptation (ISA) and Inter-domain Human-Topology Alignment (IHTA) module. The CAFA adopts a bidirectional spatial attention module (BSAM)that focuses on fine-grained local feature correlation between two humans to adaptively aggregate consistent features for adaptation. We adopt ISA only in semi-supervised domain adaptation (SSDA) to exploit the corresponding keypoint semantic relationship for reducing the intra-domain bias. Most importantly, we propose an IHTA to learn more domain-invariant human topological representation for reducing the inter-domain discrepancy. We model the human topological structure via the graph convolution network (GCN), by passing messages on which, high-order relations can be considered. This structure preserving alignment based on GCN is beneficial to the occluded or extreme pose inference. Extensive experiments are conducted on two popular benchmarks and results demonstrate the competency of our method compared with existing supervised approaches.