基于视频的人重新识别的多个跨性参考辅助专注特征聚合

论文标题

基于视频的人重新识别的多个跨性参考辅助专注特征聚合

Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-based Person Re-identification

论文作者

Zhang, Zhizheng, Lan, Cuiling, Zeng, Wenjun, Chen, Zhibo

论文摘要

基于视频的人重新识别（REID）旨在在视频剪辑中与同一个人匹配。由于框架之间存在冗余，外观，遮挡和运动模糊，这是一项具有挑战性的任务。在本文中，我们提出了一个细心的特征聚合模块，即多粒性参考辅助特征聚合（MG-RAFA），以精巧地汇总时空特征，以作为歧视性的视频级特征表示。为了确定时空特征节点的贡献/重要性，我们建议从具有卷积操作的全球视图中学习注意力。具体而言，我们堆叠其关系，即相对于代表全局视频信息的代表性参考特征节点（S-RFN）的成对相关性，以及特征本身以推断注意力。此外，为了利用不同级别的语义，我们建议根据不同粒度捕获的关系来学习多晶格的注意。广泛的消融研究证明了我们专注的特征聚合模块MG-RAFA的有效性。我们的框架在三个基准数据集上实现了最先进的性能。

Video-based person re-identification (reID) aims at matching the same person across video clips. It is a challenging task due to the existence of redundancy among frames, newly revealed appearance, occlusion, and motion blurs. In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-aided Attentive Feature Aggregation (MG-RAFA), to delicately aggregate spatio-temporal features into a discriminative video-level feature representation. In order to determine the contribution/importance of a spatial-temporal feature node, we propose to learn the attention from a global view with convolutional operations. Specifically, we stack its relations, i.e., pairwise correlations with respect to a representative set of reference feature nodes (S-RFNs) that represents global video information, together with the feature itself to infer the attention. Moreover, to exploit the semantics of different levels, we propose to learn multi-granularity attentions based on the relations captured at different granularities. Extensive ablation studies demonstrate the effectiveness of our attentive feature aggregation module MG-RAFA. Our framework achieves the state-of-the-art performance on three benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题