使用可变尺寸变换的基于监督的3D预训练来推进3D医疗图像分析

论文标题

使用可变尺寸变换的基于监督的3D预训练来推进3D医疗图像分析

Advancing 3D Medical Image Analysis with Variable Dimension Transform based Supervised 3D Pre-training

论文作者

Zhang, Shu, Li, Zihao, Zhou, Hong-Yu, Ma, Jiechao, Yu, Yizhou

论文摘要

数据获取和注释的困难大大限制了3D医学成像应用程序的培训数据集的样本量。结果，在没有足够的预训练参数的情况下，从头开始构建高性能3D卷积神经网络仍然是一项艰巨的任务。先前对3D预训练的努力经常依赖于自我监督的方法，这些方法使用未标记的数据使用预测性或对比度学习来构建不变的3D表示。但是，由于大规模监督信息无法获得，因此从这些学习框架中获得语义不变和歧视性表示仍然存在问题。在本文中，我们重新审视了一个创新而简单的完全监督的3D网络预训练框架，以利用大型2D自然图像数据集中的语义监督。通过重新设计的3D网络体系结构，使用重新制定的自然图像来解决数据稀缺问题并发展强大的3D表示。四个基准数据集的全面实验表明，所提出的预训练模型可以有效地加速收敛，同时提高了各种3D医学成像任务（例如分类，分割和检测）的准确性。此外，与从头开始的培训相比，它可以节省多达60％的注释工作。在NIH Deeplesion数据集上，它同样实现了最先进的检测性能，优于早期的自我监督和完全监督的预训练方法，以及从头开始训练的方法。为了促进3D医学模型的进一步开发，我们的代码和预培训的模型权重可以在https://github.com/urmagicsmine/cspr上公开获得。

The difficulties in both data acquisition and annotation substantially restrict the sample sizes of training datasets for 3D medical imaging applications. As a result, constructing high-performance 3D convolutional neural networks from scratch remains a difficult task in the absence of a sufficient pre-training parameter. Previous efforts on 3D pre-training have frequently relied on self-supervised approaches, which use either predictive or contrastive learning on unlabeled data to build invariant 3D representations. However, because of the unavailability of large-scale supervision information, obtaining semantically invariant and discriminative representations from these learning frameworks remains problematic. In this paper, we revisit an innovative yet simple fully-supervised 3D network pre-training framework to take advantage of semantic supervisions from large-scale 2D natural image datasets. With a redesigned 3D network architecture, reformulated natural images are used to address the problem of data scarcity and develop powerful 3D representations. Comprehensive experiments on four benchmark datasets demonstrate that the proposed pre-trained models can effectively accelerate convergence while also improving accuracy for a variety of 3D medical imaging tasks such as classification, segmentation and detection. In addition, as compared to training from scratch, it can save up to 60% of annotation efforts. On the NIH DeepLesion dataset, it likewise achieves state-of-the-art detection performance, outperforming earlier self-supervised and fully-supervised pre-training approaches, as well as methods that do training from scratch. To facilitate further development of 3D medical models, our code and pre-trained model weights are publicly available at https://github.com/urmagicsmine/CSPR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题