论文标题
D形式:3D医疗图像分割的U形扩张变压器
D-Former: A U-shaped Dilated Transformer for 3D Medical Image Segmentation
论文作者
论文摘要
计算机辅助的医学图像分割已广泛应用于诊断和治疗中,以获取临床上有用的目标器官和组织的形状和体积信息。在过去的几年中,基于卷积的神经网络(CNN)方法(例如,U-NET)占据了这一区域,但仍未捕获长期信息。因此,最近的工作介绍了用于医学图像分割任务的计算机视觉变压器变体,并获得了有希望的表演。这样的变压器通过计算成对贴片关系来建模远程依赖性。但是,它们会产生过度的计算成本,尤其是在3D医学图像上(例如CT和MRI)。在本文中,我们提出了一种称为扩张变压器的新方法,该方法对在本地和全球范围中交替捕获的成对贴片关系进行自我注意。受扩张卷积内核的启发,我们以扩张的方式进行了全球自我注意,扩大了接受场而不增加所涉及的斑块,从而降低了计算成本。基于这种扩张变压器的设计,我们构建了一个U形编码器层次结构,称为D-Former,用于3D医学图像分割。关于Synapse和ACDC数据集的实验表明,我们的D-Former模型从头开始训练,以低计算成本优于基于CNN的各种基于CNN或基于变压器的分割模型,而无需耗时的人均训练过程。
Computer-aided medical image segmentation has been applied widely in diagnosis and treatment to obtain clinically useful information of shapes and volumes of target organs and tissues. In the past several years, convolutional neural network (CNN) based methods (e.g., U-Net) have dominated this area, but still suffered from inadequate long-range information capturing. Hence, recent work presented computer vision Transformer variants for medical image segmentation tasks and obtained promising performances. Such Transformers model long-range dependency by computing pair-wise patch relations. However, they incur prohibitive computational costs, especially on 3D medical images (e.g., CT and MRI). In this paper, we propose a new method called Dilated Transformer, which conducts self-attention for pair-wise patch relations captured alternately in local and global scopes. Inspired by dilated convolution kernels, we conduct the global self-attention in a dilated manner, enlarging receptive fields without increasing the patches involved and thus reducing computational costs. Based on this design of Dilated Transformer, we construct a U-shaped encoder-decoder hierarchical architecture called D-Former for 3D medical image segmentation. Experiments on the Synapse and ACDC datasets show that our D-Former model, trained from scratch, outperforms various competitive CNN-based or Transformer-based segmentation models at a low computational cost without time-consuming per-training process.