论文标题

将更多注意力转移到视觉跟踪

Divert More Attention to Vision-Language Tracking

论文作者

Guo, Mingzhe, Zhang, Zhipeng, Fan, Heng, Jing, Liping

论文摘要

依靠变压器进行复杂的视觉功能学习,对象跟踪目睹了最新的新标准(SOTA)。但是,这种进步伴随着更大的培训数据和更长的培训期,使跟踪越来越昂贵。在本文中,我们证明了变形金刚的依赖性不是必需的,并且在实现SOTA跟踪方面,纯粹的convnets仍然具有竞争力,甚至更经济,更友好。我们的解决方案是释放多模式视觉语言(VL)跟踪的功能,只需使用Convnet。本质在于通过我们的模态混合器(Modamixer)和不对称的Convnet搜索学习新颖的统一自适应VL表示。我们表明,我们的统一自适应VL代表纯粹是用Convnet学习的,是变压器视觉特征的简单而强大的替代方案,它令人难以置信地将基于CNN的基于CNN的Siamese Tracker提高了14.5%的SUC,因为它具有挑战性的Lasot(50.7%> 65.2%),甚至超过了基于Transformer的SOTA SOTA Trackers。除经验结果外,我们从理论上分析了我们的方法来证明其有效性。通过揭示VL代表的潜力,我们希望社区将更多的关注转移到VL跟踪上,并希望为超越变形金刚的未来跟踪开放更多的可能性。代码和模型将在https://github.com/judasdie/sots上发布。

Relying on Transformer for complex visual feature learning, object tracking has witnessed the new standard for state-of-the-arts (SOTAs). However, this advancement accompanies by larger training data and longer training period, making tracking increasingly expensive. In this paper, we demonstrate that the Transformer-reliance is not necessary and the pure ConvNets are still competitive and even better yet more economical and friendly in achieving SOTA tracking. Our solution is to unleash the power of multimodal vision-language (VL) tracking, simply using ConvNets. The essence lies in learning novel unified-adaptive VL representations with our modality mixer (ModaMixer) and asymmetrical ConvNet search. We show that our unified-adaptive VL representation, learned purely with the ConvNets, is a simple yet strong alternative to Transformer visual features, by unbelievably improving a CNN-based Siamese tracker by 14.5% in SUC on challenging LaSOT (50.7% > 65.2%), even outperforming several Transformer-based SOTA trackers. Besides empirical results, we theoretically analyze our approach to evidence its effectiveness. By revealing the potential of VL representation, we expect the community to divert more attention to VL tracking and hope to open more possibilities for future tracking beyond Transformer. Code and models will be released at https://github.com/JudasDie/SOTS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源