使用多Query Transformer的端到端跟踪

论文标题

使用多Query Transformer的端到端跟踪

End-to-end Tracking with a Multi-query Transformer

论文作者

Korbar, Bruno, Zisserman, Andrew

论文摘要

多对象跟踪（MOT）是一项具有挑战性的任务，需要同时推理场景中对象的位置，外观和身份。本文我们的目的是超越逐个跟踪的方法，这些方法在已知对象类别的数据集中表现良好，可以很好地进行类别的跟踪，该跟踪对未知对象类的性能也很好，为此，我们做出以下三个贡献：首先，我们介绍{\ em em语义探测器查询}，以使对象具有近似位置，或者将其近似位置置于局部性或近似位置，或者介绍其近似位置，或者介绍其近似位置，或者介绍其近似;其次，我们将这些查询在自动回归框架中进行跟踪，并提出了基于变压器架构的多样性跟踪变压器（\ textIt {MQT}）模型同时跟踪和基于外观的重新识别（REID）。该公式使跟踪器可以以类不足的方式进行操作，并且可以训练该模型的端到端。最后，我们证明\ textIt {MQT}在标准MOT基准上竞争性能，优于一般示波器上的所有基准，并且对更难跟踪问题（例如跟踪TAO数据集中的任何对象）的概括。

Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time. Our aim in this paper is to move beyond tracking-by-detection approaches, that perform well on datasets where the object classes are known, to class-agnostic tracking that performs well also for unknown object classes.To this end, we make the following three contributions: first, we introduce {\em semantic detector queries} that enable an object to be localized by specifying its approximate position, or its appearance, or both; second, we use these queries within an auto-regressive framework for tracking, and propose a multi-query tracking transformer (\textit{MQT}) model for simultaneous tracking and appearance-based re-identification (reID) based on the transformer architecture with deformable attention. This formulation allows the tracker to operate in a class-agnostic manner, and the model can be trained end-to-end; finally, we demonstrate that \textit{MQT} performs competitively on standard MOT benchmarks, outperforms all baselines on generalised-MOT, and generalises well to a much harder tracking problems such as tracking any object on the TAO dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题