上下文注意网络：变压器符合U-NET

论文标题

上下文注意网络：变压器符合U-NET

Contextual Attention Network: Transformer Meets U-Net

论文作者

Azad, Reza, Heidari, Moein, Wu, Yuli, Merhof, Dorit

论文摘要

目前，卷积神经网络（CNN）（例如U-NET）已成为事实上的标准，并在医学图像分割方面取得了巨大的成功。但是，作为一个缺点，基于CNN的方法是一把双刃剑，因为它们无法建立远程依赖关系和全局上下文连接，这是由于卷积操作的内在特征所源自的固有特征而引起的。因此，最近的文章已利用变形金刚用于医疗图像分割任务，这些变体因其与众不同的能力通过注意机制捕获远程相关性，从而开辟了巨大的机会。尽管经过适当的设计，但大多数队列研究在捕获当地信息时会产生过表现的性能，从而导致边界区域的清醒度较小。在本文中，我们提出了一个上下文注意网络，以应对上述限制。提出的方法使用变压器模块的强度来对远程上下文依赖性进行建模。同时，它利用CNN编码器捕获本地语义信息。另外，包括对象级表示形式来对区域交互图进行建模。然后将提取的分层特征馈送到上下文注意模块中，以使用本地信息自适应地重新校准表示空间。然后，他们在考虑到变压器模块得出的远距离上下文依赖性时强调了信息区域。我们在几个大规模的公共医疗图像细分数据集上验证我们的方法并实现最先进的性能。我们已经在https://github.com/rezazad68/tmunet中提供了实现代码。

Currently, convolutional neural networks (CNN) (e.g., U-Net) have become the de facto standard and attained immense success in medical image segmentation. However, as a downside, CNN based methods are a double-edged sword as they fail to build long-range dependencies and global context connections due to the limited receptive field that stems from the intrinsic characteristics of the convolution operation. Hence, recent articles have exploited Transformer variants for medical image segmentation tasks which open up great opportunities due to their innate capability of capturing long-range correlations through the attention mechanism. Although being feasibly designed, most of the cohort studies incur prohibitive performance in capturing local information, thereby resulting in less lucidness of boundary areas. In this paper, we propose a contextual attention network to tackle the aforementioned limitations. The proposed method uses the strength of the Transformer module to model the long-range contextual dependency. Simultaneously, it utilizes the CNN encoder to capture local semantic information. In addition, an object-level representation is included to model the regional interaction map. The extracted hierarchical features are then fed to the contextual attention module to adaptively recalibrate the representation space using the local information. Then, they emphasize the informative regions while taking into account the long-range contextual dependency derived by the Transformer module. We validate our method on several large-scale public medical image segmentation datasets and achieve state-of-the-art performance. We have provided the implementation code in https://github.com/rezazad68/TMUnet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题