联合自我监督图像体积代表性学习与内部对比聚类的学习

论文标题

联合自我监督图像体积代表性学习与内部对比聚类的学习

Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering

论文作者

Nguyen, Duy M. H., Nguyen, Hoang, Truong, Mai T. N., Cao, Tri, Nguyen, Binh T., Ho, Nhat, Swoboda, Paul, Albarqouni, Shadi, Xie, Pengtao, Sonntag, Daniel

论文摘要

用完全注释的样本收集大规模的医疗数据集以培训深网的训练非常昂贵，尤其是对于3D卷数据。自我监督学习（SSL）的最新突破提供了通过从未标记的数据中学习特征表示来克服缺乏标记的培训样本的能力。但是，医学领域中的大多数SSL技术都是为2D图像或3D卷设计的。实际上，这限制了从众多来源完全利用未标记数据的能力，其中可能包括2D和3D数据。此外，将这些预训练的网络的使用限制在具有兼容数据维度的下游任务中。在本文中，我们为2D和3D数据模式提供了一个新颖的框架。给定一组2D图像或从3D卷中提取的2D切片，我们基于不同类别的2D对比聚类问题构建了SSL任务。通过计算每个切片的载体嵌入，然后通过变压器中可变形的自我发项机制组装整体特征，从而利用3D卷，从而可以在3D体积内部的切片之间结合长距离依赖性。这些整体特征进一步用于定义基于3D集群协议的新型SSL任务，并掩盖了受训练的语言模型启发的嵌入预测。在下游任务（例如3D脑部分割，肺结核检测，3D心脏结构分割和异常的胸部X射线检测）上进行的实验证明了我们关节2D和3D SSL方法的有效性。我们通过显着的边距改善了普通的2D Deep-ClusterV2，并超过了各种现代2D和3D SSL方法。

Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题