通过：重新识别人的部分自我监督预培训

论文标题

通过：重新识别人的部分自我监督预培训

PASS: Part-Aware Self-Supervised Pre-Training for Person Re-Identification

论文作者

Zhu, Kuan, Guo, Haiyun, Yan, Tianyi, Zhu, Yousong, Wang, Jinqiao, Tang, Ming

论文摘要

在亲自重新识别（REID）中，最近的研究已经验证了未标记的人图像上的模型的预训练要比Imagenet上的图像要好得多。但是，这些研究直接应用了为图像分类设计的现有自我监督学习（SSL）方法，可用于REID，而无需在框架中进行任何适应。这些SSL方法将本地视图的输出（例如红色T恤，蓝色短裤）与同时的全球视图相匹配，从而丢失了许多细节。在本文中，我们提出了一种特定于REID的预训练方法，部分意识到的自我监管的预训练（PASS），该方法可以生成零件级别的功能以提供细粒度的信息，并且更适合REID。通行证将图像分为几个地方区域，每个区域随机裁剪的本地视图将分配给特定的可学习[部分]令牌。另一方面，所有地方区域的[部分]也附加到全球视图中。通行证学会与同一[部分]上的本地视图的输出和全局视图匹配。也就是说，从当地区域学习的本地视图的[部分]仅与从全球视图中学到的相应[部分]相匹配。结果，每个[部分]可以专注于图像的特定局部区域，并提取该区域的细粒度信息。实验显示通行证在Market1501和MSMT17上进行了各种REID任务上的新最新表演，例如，通过Pass Achieves 92.2 \％/90.2 \％/90.2 \％/88.5 \％MAP PECICAR MARKET1501在Market1501上进行监督/UDA/UDA/UDA/USL REID。我们的代码可在https://github.com/casia-iva-lab/pass-reid上找到。

In person re-identification (ReID), very recent researches have validated pre-training the models on unlabelled person images is much better than on ImageNet. However, these researches directly apply the existing self-supervised learning (SSL) methods designed for image classification to ReID without any adaption in the framework. These SSL methods match the outputs of local views (e.g., red T-shirt, blue shorts) to those of the global views at the same time, losing lots of details. In this paper, we propose a ReID-specific pre-training method, Part-Aware Self-Supervised pre-training (PASS), which can generate part-level features to offer fine-grained information and is more suitable for ReID. PASS divides the images into several local areas, and the local views randomly cropped from each area are assigned with a specific learnable [PART] token. On the other hand, the [PART]s of all local areas are also appended to the global views. PASS learns to match the output of the local views and global views on the same [PART]. That is, the learned [PART] of the local views from a local area is only matched with the corresponding [PART] learned from the global views. As a result, each [PART] can focus on a specific local area of the image and extracts fine-grained information of this area. Experiments show PASS sets the new state-of-the-art performances on Market1501 and MSMT17 on various ReID tasks, e.g., vanilla ViT-S/16 pre-trained by PASS achieves 92.2\%/90.2\%/88.5\% mAP accuracy on Market1501 for supervised/UDA/USL ReID. Our codes are available at https://github.com/CASIA-IVA-Lab/PASS-reID.

下载PDF全文

下载文献需遵守相关版权规定

论文标题