改头换面的人的语义意识关注和视觉屏蔽网络重新识别

论文标题

改头换面的人的语义意识关注和视觉屏蔽网络重新识别

A Semantic-aware Attention and Visual Shielding Network for Cloth-changing Person Re-identification

论文作者

Gao, Zan, Wei, Hongwei, Guan, Weili, Nie, Jie, Wang, Meng, Chen, Shenyong

论文摘要

换衣服的人重新识别（REID）是一个新出现的研究主题，旨在检索换衣服的行人。由于带有不同衣服的人类外观表现出很大的变化，因此现有方法很难提取歧视性和健壮的特征表示形式。当前的作品主要集中在身体形状或轮廓草图上，但是人类的语义信息以及换衣服之前和之后的行人特征的潜在一致性未被充分探索或被忽略。为了解决这些问题，在这项工作中，提出了一种新颖的语义意识到的关注和视觉屏蔽网络Reid（缩写为SAVS），提出了其中的关键思想是屏蔽与衣服外观相关的线索，并且仅关注对视图/姿势变化不敏感的视觉语义信息。具体而言，首先采用了视觉语义编码器来基于人类语义分割信息来定位人体和服装区域。然后，提出了人类的语义注意模块（HSA），以突出显示人类语义信息并重新重量视觉特征图。此外，视觉服装屏蔽模块（VCS）还旨在通过覆盖衣服区域并将模型集中在与衣服无关的视觉语义信息上，以提取更健壮的特征代表。最重要的是，这两个模块在端到端统一框架中共同探索。广泛的实验表明，所提出的方法可以显着胜过最先进的方法，并且可以为换衣的人提取更健壮的特征。与FSAM相比（在CVPR 2021中发表），该方法可以分别在LTCC和PRCC数据集上分别在MAP（RANK-1）上获得32.7％（16.5％）和14.9％（ - ）。

Cloth-changing person reidentification (ReID) is a newly emerging research topic that aims to retrieve pedestrians whose clothes are changed. Since the human appearance with different clothes exhibits large variations, it is very difficult for existing approaches to extract discriminative and robust feature representations. Current works mainly focus on body shape or contour sketches, but the human semantic information and the potential consistency of pedestrian features before and after changing clothes are not fully explored or are ignored. To solve these issues, in this work, a novel semantic-aware attention and visual shielding network for cloth-changing person ReID (abbreviated as SAVS) is proposed where the key idea is to shield clues related to the appearance of clothes and only focus on visual semantic information that is not sensitive to view/posture changes. Specifically, a visual semantic encoder is first employed to locate the human body and clothing regions based on human semantic segmentation information. Then, a human semantic attention module (HSA) is proposed to highlight the human semantic information and reweight the visual feature map. In addition, a visual clothes shielding module (VCS) is also designed to extract a more robust feature representation for the cloth-changing task by covering the clothing regions and focusing the model on the visual semantic information unrelated to the clothes. Most importantly, these two modules are jointly explored in an end-to-end unified framework. Extensive experiments demonstrate that the proposed method can significantly outperform state-of-the-art methods, and more robust features can be extracted for cloth-changing persons. Compared with FSAM (published in CVPR 2021), this method can achieve improvements of 32.7% (16.5%) and 14.9% (-) on the LTCC and PRCC datasets in terms of mAP (rank-1), respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题