论文标题
Vindr-Mammo:用于全场数字乳房X线摄影的大规模基准数据集
VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography
论文作者
论文摘要
乳房X线摄影或乳房X射线是最广泛使用的成像方式,可检测癌症和其他乳房疾病。最近的研究表明,已经开发出基于基于学习的计算机辅助检测和诊断(CADE或CADX)工具来支持医生并提高解释乳房X线摄影的准确性。但是,大多数已发表的乳房X线摄影数据集都受到样本量的限制,要么是从屏幕X年乳房X线摄影(SFM)中数字化的,因此阻碍了基于全场数字乳房X线摄影(FFDM)开发的CADE和CADX工具的开发。为了克服这一挑战,我们引入了Vindr -Mammo- FFDM的新基准数据集,用于检测和诊断乳腺X线摄影中的乳腺癌和其他疾病。该数据集由5,000个乳房X线摄影检查组成,每个考试都有四个标准视图,并且通过仲裁解决了分歧(如果有)的双重读取(如果有)。它是用于评估乳房成像报告和数据系统(BI-RADS)和乳房密度的。此外,数据集还提供了非固定发现的类别,位置和BIADS评估。我们将Vindr-Mammo在Physionet上公开可用,作为一种新的成像资源,以促进开发CADE和CADX乳腺癌筛查工具的进步。
Mammography, or breast X-ray, is the most widely used imaging modality to detect cancer and other breast diseases. Recent studies have shown that deep learning-based computer-assisted detection and diagnosis (CADe or CADx) tools have been developed to support physicians and improve the accuracy of interpreting mammography. However, most published datasets of mammography are either limited on sample size or digitalized from screen-film mammography (SFM), hindering the development of CADe and CADx tools which are developed based on full-field digital mammography (FFDM). To overcome this challenge, we introduce VinDr-Mammo - a new benchmark dataset of FFDM for detecting and diagnosing breast cancer and other diseases in mammography. The dataset consists of 5,000 mammography exams, each of which has four standard views and is double read with disagreement (if any) being resolved by arbitration. It is created for the assessment of Breast Imaging Reporting and Data System (BI-RADS) and density at the breast level. In addition, the dataset also provides the category, location, and BI-RADS assessment of non-benign findings. We make VinDr-Mammo publicly available on PhysioNet as a new imaging resource to promote advances in developing CADe and CADx tools for breast cancer screening.