EDGENEXT：有效合并移动视觉应用程序的CNN转换架构

论文标题

EDGENEXT：有效合并移动视觉应用程序的CNN转换架构

EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

论文作者

Maaz, Muhammad, Shaker, Abdelrahman, Cholakkal, Hisham, Khan, Salman, Zamir, Syed Waqas, Anwer, Rao Muhammad, Khan, Fahad Shahbaz

论文摘要

为了实现不断增长的准确性，通常会开发大型和复杂的神经网络。这样的模型需要高度的计算资源，因此不能在边缘设备上部署。由于它们在几个应用领域的有用性，建立资源有效的通用网络非常感兴趣。在这项工作中，我们努力有效地结合了CNN和变压器模型的优势，并提出了一种新的有效混合体系结构。特别是在EDGENEXT中，我们引入了分裂深度的转置注意力（STDA）编码器，该编码器将输入张量分解为多个通道组，并利用深度范围的卷积以及跨通道维度的自我注意力，以隐含地增加接受场并编码多尺度特征。我们在分类，检测和分割任务上进行的广泛实验揭示了所提出的方法的优点，优于较低的计算要求的最先进方法。我们具有130万参数的EDGENEXT模型在Imagenet-1k上实现了71.2％的TOP-1准确性，超过移动设备的绝对增益超过2.2％，而FLOPS降低了28％。此外，我们具有560万参数的EDGENEXT模型在Imagenet-1k上达到了79.4％的TOP-1精度。代码和型号可在https://t.ly/_vu9上找到。

In the pursuit of achieving ever-increasing accuracy, large and complex neural networks are usually developed. Such models demand high computational resources and therefore cannot be deployed on edge devices. It is of great interest to build resource-efficient general purpose networks due to their usefulness in several application areas. In this work, we strive to effectively combine the strengths of both CNN and Transformer models and propose a new efficient hybrid architecture EdgeNeXt. Specifically in EdgeNeXt, we introduce split depth-wise transpose attention (STDA) encoder that splits input tensors into multiple channel groups and utilizes depth-wise convolution along with self-attention across channel dimensions to implicitly increase the receptive field and encode multi-scale features. Our extensive experiments on classification, detection and segmentation tasks, reveal the merits of the proposed approach, outperforming state-of-the-art methods with comparatively lower compute requirements. Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2.2% with 28% reduction in FLOPs. Further, our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K. The code and models are available at https://t.ly/_Vu9.

下载PDF全文

下载文献需遵守相关版权规定

论文标题