使用自缩放的蒙版图像变压器（SMIT）的自我监督的3D解剖学分割

论文标题

使用自缩放的蒙版图像变压器（SMIT）的自我监督的3D解剖学分割

Self-supervised 3D anatomy segmentation using self-distilled masked image transformer (SMIT)

论文作者

Jiang, Jue, Tyagi, Neelam, Tringale, Kathryn, Crane, Christopher, Veeraraghavan, Harini

论文摘要

视觉变形金刚具有更有效地对远程环境进行建模的能力，在包括分段在内的几个计算机视觉和医学图像分析任务中都表现出了令人印象深刻的准确性提高。但是，这种方法需要大型标记的数据集进行培训，这对于医学图像分析很难获得。使用卷积网络在医学图像细分中表现出成功的自我监督学习（SSL）。在这项工作中，我们开发了一种\下划线{S} Elf-Distillation学习，并使用\下划线{M}询问\下划线{I} mage建模方法，以执行SSL的SSL，以适用于Vision \ usevenline {t} ransformers（smit）应用于CT和MRI的3D多控股物分割。我们的贡献是在称为蒙版图像预测的蒙版贴片中的密集像素回归，我们将其与蒙版的贴片令牌蒸馏结合在一起，作为预训练视觉变压器的借口任务。我们显示我们的方法更准确，并且比其他借口任务所需的微调数据集更少。与先前的医学图像方法不同，该方法通常使用与目标任务相对应的疾病站点和成像模式产生的图像集，我们使用了3,643次CT扫描（602,708张图像）（602,708张图像），由头部和颈部，肺和肾脏癌和肾脏癌以及公共培训和应用于腹部癌症及其癌症及其癌症及其分段，以及与MRI及其分段相比从CT分割腹部器官。我们的方法显示出明显的准确性提高（从MRI的平均DSC为0.875，从CT到0.878），对通常使用的借口任务的微调数据集的要求降低。与当前的多种SSL方法进行了广泛的比较。验收后将提供代码。

Vision transformers, with their ability to more efficiently model long-range context, have demonstrated impressive accuracy gains in several computer vision and medical image analysis tasks including segmentation. However, such methods need large labeled datasets for training, which is hard to obtain for medical image analysis. Self-supervised learning (SSL) has demonstrated success in medical image segmentation using convolutional networks. In this work, we developed a \underline{s}elf-distillation learning with \underline{m}asked \underline{i}mage modeling method to perform SSL for vision \underline{t}ransformers (SMIT) applied to 3D multi-organ segmentation from CT and MRI. Our contribution is a dense pixel-wise regression within masked patches called masked image prediction, which we combined with masked patch token distillation as pretext task to pre-train vision transformers. We show our approach is more accurate and requires fewer fine tuning datasets than other pretext tasks. Unlike prior medical image methods, which typically used image sets arising from disease sites and imaging modalities corresponding to the target tasks, we used 3,643 CT scans (602,708 images) arising from head and neck, lung, and kidney cancers as well as COVID-19 for pre-training and applied it to abdominal organs segmentation from MRI pancreatic cancer patients as well as publicly available 13 different abdominal organs segmentation from CT. Our method showed clear accuracy improvement (average DSC of 0.875 from MRI and 0.878 from CT) with reduced requirement for fine-tuning datasets over commonly used pretext tasks. Extensive comparisons against multiple current SSL methods were done. Code will be made available upon acceptance for publication.

下载PDF全文

下载文献需遵守相关版权规定

论文标题