论文标题
封闭者重新识别的短距离相关变压器
Short Range Correlation Transformer for Occluded Person Re-Identification
论文作者
论文摘要
被遮挡的人重新识别是计算机视觉的挑战领域之一,该领域面临着诸如效率低下的特征表示和低识别精度之类的问题。卷积神经网络更多地关注局部特征的提取,因此很难提取封闭行人的特征,效果并不那么满意。最近,视觉变压器被引入重新识别领域,并通过构建斑块序列之间的全局特征的关系来实现最先进的结果。但是,视觉变压器在提取本地特征时的性能不如卷积神经网络。因此,我们设计了一个基于部分特征变压器的人重新识别框架,名为PFT。拟议的PFT利用三个模块来提高视觉变压器的效率。 (1)补丁完整尺寸增强模块。我们设计了一个可学习的张量,其大小与斑块序列相同,该量是全维且深层嵌入斑块序列中的,以丰富训练样品的多样性。 (2)融合和重建模块。我们提取所获得的斑块序列的不太重要部分,并将它们与原始贴片序列融合以重建原始的斑块序列。 (3)空间切片模块。我们从空间方向切片和组斑点序列,这可以有效地改善斑块序列的短距离相关性。超过封闭和整体重新识别数据集的实验结果表明,所提出的PFT网络始终如一地达到卓越的性能,并且表现优于最先进的方法。
Occluded person re-identification is one of the challenging areas of computer vision, which faces problems such as inefficient feature representation and low recognition accuracy. Convolutional neural network pays more attention to the extraction of local features, therefore it is difficult to extract features of occluded pedestrians and the effect is not so satisfied. Recently, vision transformer is introduced into the field of re-identification and achieves the most advanced results by constructing the relationship of global features between patch sequences. However, the performance of vision transformer in extracting local features is inferior to that of convolutional neural network. Therefore, we design a partial feature transformer-based person re-identification framework named PFT. The proposed PFT utilizes three modules to enhance the efficiency of vision transformer. (1) Patch full dimension enhancement module. We design a learnable tensor with the same size as patch sequences, which is full-dimensional and deeply embedded in patch sequences to enrich the diversity of training samples. (2) Fusion and reconstruction module. We extract the less important part of obtained patch sequences, and fuse them with original patch sequence to reconstruct the original patch sequences. (3) Spatial Slicing Module. We slice and group patch sequences from spatial direction, which can effectively improve the short-range correlation of patch sequences. Experimental results over occluded and holistic re-identification datasets demonstrate that the proposed PFT network achieves superior performance consistently and outperforms the state-of-the-art methods.