论文标题
生物医学图像的复合图分离:挖掘大型数据集用于自我监督学习
Compound Figure Separation of Biomedical Images: Mining Large Datasets for Self-supervised Learning
论文作者
论文摘要
随着自我监督学习的快速发展(例如,对比度学习),在医学图像分析中广泛认识到具有大规模图像(即使没有注释)来训练更概括的AI模型的重要性。但是,大规模收集大规模任务的非大规模数据对单个实验室可能具有挑战性。现有的在线资源(例如数字书籍,出版物和搜索引擎)为获取大型图像提供了新的资源。但是,在医疗保健中已发表的图像(例如放射学和病理学)由大量带有子图的化合物数字组成。为了将复合图提取到可用的单个图像中以进行下游学习,我们提出了一个简单的复合图形分离(SIMCFS)框架,而无需使用传统所需的检测边界框注释,并具有新的损失函数和硬案例模拟。我们的技术贡献是四倍:(1)我们引入了一个基于模拟的培训框架,该框架最大程度地减少了对资源广泛的边界框注释的需求; (2)我们提出了一种新的侧面损失,该损失已针对复合人物分离进行了优化; (3)我们提出了一种阶层内图像增强方法来模拟硬病例; (4)据我们所知,这是第一项评估使用复合图像分离来利用自学学习的功效的研究。从结果来看,提议的SIMCF在ImageClef 2016复合人物分离数据库上实现了最先进的性能。使用大规模开采的数字预审预定的学习模型通过对比度学习算法提高了下游图像分类任务的准确性。 SIMCF的源代码可在https://github.com/hrlblab/imageseperation上公开获得。
With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as digital books, publications, and search engines, provide a new resource for obtaining large-scale images. However, published images in healthcare (e.g., radiology and pathology) consist of a considerable amount of compound figures with subplots. In order to extract and separate compound figures into usable individual images for downstream learning, we propose a simple compound figure separation (SimCFS) framework without using the traditionally required detection bounding box annotations, with a new loss function and a hard case simulation. Our technical contribution is four-fold: (1) we introduce a simulation-based training framework that minimizes the need for resource extensive bounding box annotations; (2) we propose a new side loss that is optimized for compound figure separation; (3) we propose an intra-class image augmentation method to simulate hard cases; and (4) to the best of our knowledge, this is the first study that evaluates the efficacy of leveraging self-supervised learning with compound image separation. From the results, the proposed SimCFS achieved state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The pretrained self-supervised learning model using large-scale mined figures improved the accuracy of downstream image classification tasks with a contrastive learning algorithm. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation.