人群本地化的端到端变压器模型

论文标题

人群本地化的端到端变压器模型

An End-to-End Transformer Model for Crowd Localization

论文作者

Liang, Dingkang, Xu, Wei, Bai, Xiang

论文摘要

人群本地化，预测头位的位置，比仅仅计数更实用，更高的任务。现有方法采用伪装框或预设计的本地化图，依靠复杂的后处理来获得头部位置。在本文中，我们提出了一个名为CLTR的优雅的端到端人群本地化变压器，该变压器在基于回归的范式中解决了任务。提出的方法将人群定位视为直接设置的预测问题，将提取的功能和可训练的嵌入方式作为变压器描述器的输入。为了减少模棱两可的点并产生更合理的匹配结果，我们引入了基于KMO的匈牙利匹配器，该匹配器采用附近的环境作为辅助匹配成本。在各种数据设置的五个数据集上进行的广泛实验显示了我们方法的有效性。特别是，所提出的方法在NWPU-CROWD，UCF-QNRF和HANGHAITECH PART A PART A PART A PART上实现了最佳的本地化性能。

Crowd localization, predicting head positions, is a more practical and high-level task than simply counting. Existing methods employ pseudo-bounding boxes or pre-designed localization maps, relying on complex post-processing to obtain the head positions. In this paper, we propose an elegant, end-to-end Crowd Localization Transformer named CLTR that solves the task in the regression-based paradigm. The proposed method views the crowd localization as a direct set prediction problem, taking extracted features and trainable embeddings as input of the transformer-decoder. To reduce the ambiguous points and generate more reasonable matching results, we introduce a KMO-based Hungarian matcher, which adopts the nearby context as the auxiliary matching cost. Extensive experiments conducted on five datasets in various data settings show the effectiveness of our method. In particular, the proposed method achieves the best localization performance on the NWPU-Crowd, UCF-QNRF, and ShanghaiTech Part A datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题