论文标题
TransDeepLab:用于医疗图像分割的基于无卷积变压器的DeepLab V3+
TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation
论文作者
论文摘要
多年来,卷积神经网络(CNN)已成为多种计算机视觉任务的事实上的标准。特别是,基于开创性体系结构(例如具有跳过连接的U形模型)或具有金字塔池的Artous卷积的深度神经网络已针对广泛的医学图像分析任务进行了量身定制。此类架构的主要优点是它们容易拘留多功能本地功能。但是,作为一般共识,CNN无法捕获由于卷积操作的固有特性的内在特性而捕获长期依赖性和空间相关性。另外,变压器从全球信息建模中获利,源于自我发项机制,最近在自然语言处理和计算机视觉方面取得了出色的表现。然而,以前的研究证明,局部和全局特征对于密集预测的深层模型至关重要,例如以不同的形状和配置对复杂的结构进行分割。为此,本文提出了TransDeeplab,这是一种新型的DeepLab样纯变压器,用于医学图像分割。具体而言,我们用移位的窗口利用层次的Swin-Transformer来扩展DeepLabV3并建模现有的空间金字塔池(ASPP)模块。对相关文献的彻底搜索表明,我们是第一个使用基于纯变压器模型对开创性DeepLab模型进行建模的人。有关各种医学图像分割任务的广泛实验验证了我们的方法是否在视觉变压器和基于CNN的方法的合并中表现出色或与大多数当代作品相提并论,并显着降低了模型复杂性。代码和训练有素的模型可在https://github.com/rezazad68/transdeeplab上公开获得
Convolutional neural networks (CNNs) have been the de facto standard in a diverse set of computer vision tasks for many years. Especially, deep neural networks based on seminal architectures such as U-shaped models with skip-connections or atrous convolution with pyramid pooling have been tailored to a wide range of medical image analysis tasks. The main advantage of such architectures is that they are prone to detaining versatile local features. However, as a general consensus, CNNs fail to capture long-range dependencies and spatial correlations due to the intrinsic property of confined receptive field size of convolution operations. Alternatively, Transformer, profiting from global information modelling that stems from the self-attention mechanism, has recently attained remarkable performance in natural language processing and computer vision. Nevertheless, previous studies prove that both local and global features are critical for a deep model in dense prediction, such as segmenting complicated structures with disparate shapes and configurations. To this end, this paper proposes TransDeepLab, a novel DeepLab-like pure Transformer for medical image segmentation. Specifically, we exploit hierarchical Swin-Transformer with shifted windows to extend the DeepLabv3 and model the Atrous Spatial Pyramid Pooling (ASPP) module. A thorough search of the relevant literature yielded that we are the first to model the seminal DeepLab model with a pure Transformer-based model. Extensive experiments on various medical image segmentation tasks verify that our approach performs superior or on par with most contemporary works on an amalgamation of Vision Transformer and CNN-based methods, along with a significant reduction of model complexity. The codes and trained models are publicly available at https://github.com/rezazad68/transdeeplab