论文标题
多尺度变压器网络具有边缘感知的跨模式MR图像合成的预训练
Multi-scale Transformer Network with Edge-aware Pre-training for Cross-Modality MR Image Synthesis
论文作者
论文摘要
跨模式磁共振(MR)图像合成可用于从给定的形式产生缺失的方式。现有的(监督学习)方法通常需要大量配对的多模式数据来训练有效的合成模型。但是,获得足够的配对数据以进行监督培训通常具有挑战性。实际上,我们经常有少量的配对数据,而大量未配对的数据。为了利用配对和不成对数据的优势,在本文中,我们提出了一个多尺度变压器网络(MT-NET),具有边缘感知的预训练,用于跨模式MR图像合成。具体而言,首先以自我监督的方式进行预训练,以同时执行1)在每个图像中随机掩盖的贴片和2)整个边缘映射估计,从而有效地学习上下文和结构信息。此外,提出了一种新颖的斑块损失,以根据其各自的插图的困难对不同的掩盖贴片进行不同的处理,从而提高边缘摩ea的性能。基于此拟议的预训练,在随后的微调阶段,(在我们的MT-NET中)设计了双尺度选择性融合(DSF)模块,以通过积分从预训练的边缘MAE的编码器中提取的多尺度特征来合成缺失模式图像。此外,该预训练的编码器还用于从合成图像和相应的地面图像中提取高级特征,在训练中需要相似(一致)。实验结果表明,即使使用所有可用配对数据的$ 70 \%$,我们的MT-NET甚至可以达到与竞争方法的可比性能。我们的代码将在https://github.com/lyhkevin/mt-net上公开获取。
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model. However, it is often challenging to obtain sufficient paired data for supervised training. In reality, we often have a small number of paired data while a large number of unpaired data. To take advantage of both paired and unpaired data, in this paper, we propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis. Specifically, an Edge-preserving Masked AutoEncoder (Edge-MAE) is first pre-trained in a self-supervised manner to simultaneously perform 1) image imputation for randomly masked patches in each image and 2) whole edge map estimation, which effectively learns both contextual and structural information. Besides, a novel patch-wise loss is proposed to enhance the performance of Edge-MAE by treating different masked patches differently according to the difficulties of their respective imputations. Based on this proposed pre-training, in the subsequent fine-tuning stage, a Dual-scale Selective Fusion (DSF) module is designed (in our MT-Net) to synthesize missing-modality images by integrating multi-scale features extracted from the encoder of the pre-trained Edge-MAE. Further, this pre-trained encoder is also employed to extract high-level features from the synthesized image and corresponding ground-truth image, which are required to be similar (consistent) in the training. Experimental results show that our MT-Net achieves comparable performance to the competing methods even using $70\%$ of all available paired data. Our code will be publicly available at https://github.com/lyhkevin/MT-Net.