论文标题
通过层次交叉注意变压器进行有效的视觉跟踪
Efficient Visual Tracking via Hierarchical Cross-Attention Transformer
论文作者
论文摘要
近年来,目标跟踪在准确性方面取得了长足进步。这种开发主要归因于功能强大的网络(例如变形金刚)和其他模块(例如在线更新和改进模块)。但是,对跟踪速度的关注较少。大多数最先进的跟踪器对强大GPU的实时速度感到满意。但是,实际应用需要更高的跟踪速度要求,尤其是当使用资源有限的边缘平台时。在这项工作中,我们通过名为HCAT的层次交叉注意变压器提出了一种有效的跟踪方法。我们的型号在GPU上运行约195 fps,CPU上的45 fps和Nvidia Jetson Agx Xavier的边缘AI平台上的55 fps。实验表明,我们的HCAT在LASOT,GOT-10K,TRACKINGNET,NFS,OTB100,UAV123和DOUT2020上取得了有希望的结果。代码和型号可在https://github.com/chenxin-dlut/hcat上找到。
In recent years, target tracking has made great progress in accuracy. This development is mainly attributed to powerful networks (such as transformers) and additional modules (such as online update and refinement modules). However, less attention has been paid to tracking speed. Most state-of-the-art trackers are satisfied with the real-time speed on powerful GPUs. However, practical applications necessitate higher requirements for tracking speed, especially when edge platforms with limited resources are used. In this work, we present an efficient tracking method via a hierarchical cross-attention transformer named HCAT. Our model runs about 195 fps on GPU, 45 fps on CPU, and 55 fps on the edge AI platform of NVidia Jetson AGX Xavier. Experiments show that our HCAT achieves promising results on LaSOT, GOT-10k, TrackingNet, NFS, OTB100, UAV123, and VOT2020. Code and models are available at https://github.com/chenxin-dlut/HCAT.