论文标题

重新考虑Mobilenet尺寸和速度的视觉变压器

Rethinking Vision Transformers for MobileNet Size and Speed

论文作者

Li, Yanyu, Hu, Ju, Wen, Yang, Evangelidis, Georgios, Salahi, Kamyar, Wang, Yanzhi, Tulyakov, Sergey, Ren, Jian

论文摘要

随着视觉变压器(VIT)在计算机视觉任务中的成功,最近的艺术试图优化VIT的性能和复杂性,以便在移动设备上有效部署。提出了多种方法来加速注意力机制,提高效率低下的设计或结合移动友好的轻量级卷积以形成混合体系结构。但是,VIT及其变体仍然具有更高的延迟或比轻量级CNN的参数更高,甚至在过去的Mobilenet中是正确的。实际上,延迟和尺寸对于在资源构成硬件上有效部署都至关重要。在这项工作中,我们研究了一个中心问题,变压器模型可以像Mobilenet一样快地运行并保持相似的尺寸吗?我们重新审视VIT的设计选择,并提出一个具有低潜伏期和高参数效率的新型超级网。我们进一步为变压器模型引入了一种新颖的细粒搜索策略,可以通过同时优化参数的延迟和参数数来找到有效的体系结构。提出的模型有效formicformerv2比具有相似延迟和参数的ImabInet-1K上的MobileNetV2比MobilenetV2高3.5%。这项工作表明,即使使用Mobilenet级的尺寸和速度,正确设计和优化的视觉变压器也可以实现高性能。

With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices. Multiple approaches are proposed to accelerate attention mechanism, improve inefficient designs, or incorporate mobile-friendly lightweight convolutions to form hybrid architectures. However, ViT and its variants still have higher latency or considerably more parameters than lightweight CNNs, even true for the years-old MobileNet. In practice, latency and size are both crucial for efficient deployment on resource-constraint hardware. In this work, we investigate a central question, can transformer models run as fast as MobileNet and maintain a similar size? We revisit the design choices of ViTs and propose a novel supernet with low latency and high parameter efficiency. We further introduce a novel fine-grained joint search strategy for transformer models that can find efficient architectures by optimizing latency and number of parameters simultaneously. The proposed models, EfficientFormerV2, achieve 3.5% higher top-1 accuracy than MobileNetV2 on ImageNet-1K with similar latency and parameters. This work demonstrate that properly designed and optimized vision transformers can achieve high performance even with MobileNet-level size and speed.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源