论文标题

胸部射线照相偏见的风险深度学习基础模型

Risk of Bias in Chest Radiography Deep Learning Foundation Models

论文作者

Glocker, Ben, Jones, Charles, Roschewitz, Melanie, Winzeck, Stefan

论文摘要

目的:分析最近发表的胸部射线照相基础模型,以实现偏见的存在,这可能导致在生物学性和种族之间的亚组绩效差异。 材料和方法:这项回顾性研究使用了来自42,884例患者(平均年龄,63岁[SD] 17年; 23,623名男性,19,261位女性)的127,118张胸部X光片,从2002年10月至2017年10月收集的CHEXPERT数据集中,在2017年10月至2017年7月之间收集的CHEXPERT数据集。确定胸部辐射模型的两种偏见,以确定胸部辐射模型的偏见。 Kolmogorov-Smirnov测试用于检测性别和种族之间的分布变化。然后进行了全面的疾病检测性能分析,以将特征的任何偏见与患者亚组的分类表现的特定差异相关联。 结果:与基线模型中的四个重要测试相比,在研究基础模型的十二个成对比较中,有十二个成对比较在统计学上具有显着差异。在主要捕获疾病的特征投影中,男性和女性(p <.001)和亚洲和黑人患者(p <.001)之间发现了显着差异。与所有亚组的平均模型性能相比,女性患者的“ NO发现”标签上的分类性能在6.8%至7.8%之间下降,而黑人患者检测“胸膜积液”的性能下降了10.7%至11.6%。 结论:研究的胸部射线照相基础模型表明,种族和性别相关的偏见导致患者亚组的表现不同,并且对于临床应用可能不安全。

Purpose: To analyze a recently published chest radiography foundation model for the presence of biases that could lead to subgroup performance disparities across biological sex and race. Materials and Methods: This retrospective study used 127,118 chest radiographs from 42,884 patients (mean age, 63 [SD] 17 years; 23,623 male, 19,261 female) from the CheXpert dataset collected between October 2002 and July 2017. To determine the presence of bias in features generated by a chest radiography foundation model and baseline deep learning model, dimensionality reduction methods together with two-sample Kolmogorov-Smirnov tests were used to detect distribution shifts across sex and race. A comprehensive disease detection performance analysis was then performed to associate any biases in the features to specific disparities in classification performance across patient subgroups. Results: Ten out of twelve pairwise comparisons across biological sex and race showed statistically significant differences in the studied foundation model, compared with four significant tests in the baseline model. Significant differences were found between male and female (P < .001) and Asian and Black patients (P < .001) in the feature projections that primarily capture disease. Compared with average model performance across all subgroups, classification performance on the 'no finding' label dropped between 6.8% and 7.8% for female patients, and performance in detecting 'pleural effusion' dropped between 10.7% and 11.6% for Black patients. Conclusion: The studied chest radiography foundation model demonstrated racial and sex-related bias leading to disparate performance across patient subgroups and may be unsafe for clinical applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源