论文标题

仔细培训可以大大减轻机器学习模型的偏见:来自神经影像学研究的证据

Bias in Machine Learning Models Can Be Significantly Mitigated by Careful Training: Evidence from Neuroimaging Studies

论文作者

Wang, Rongguang, Chaudhari, Pratik, Davatzikos, Christos

论文摘要

尽管人们对机器学习在许多医学领域提供了巨大的希望,但它还引起了人们对在性别,年龄分布,种族和种族,医院以及数据获取设备和协议中潜在偏见和泛化不良的担忧。在当前的研究中,在三种脑部疾病的背景下,我们提供了证据表明,如果经过适当的训练,机器学习模型可以很好地推广到各种条件下,并且不一定会遭受偏见。 Specifically, by using multi-study magnetic resonance imaging consortia for diagnosing Alzheimer's disease, schizophrenia, and autism spectrum disorder, we find that well-trained models have a high area-under-the-curve (AUC) on subjects across different subgroups pertaining to attributes such as gender, age, racial groups, and different clinical studies and are unbiased under multiple fairness metrics such as demographic parity difference,均衡的几率差异,机会差异等。我们发现,从人口统计学,临床,遗传因素和认知评分中纳入多源数据的模型也是公正的。这些模型比仅具有成像功能的培训的模型在子组中具有更好的预测性AUC,但是在某些情况下,这些其他功能无济于事。

Despite the great promise that machine learning has offered in many fields of medicine, it has also raised concerns about potential biases and poor generalization across genders, age distributions, races and ethnicities, hospitals, and data acquisition equipment and protocols. In the current study, and in the context of three brain diseases, we provide evidence which suggests that when properly trained, machine learning models can generalize well across diverse conditions and do not necessarily suffer from bias. Specifically, by using multi-study magnetic resonance imaging consortia for diagnosing Alzheimer's disease, schizophrenia, and autism spectrum disorder, we find that well-trained models have a high area-under-the-curve (AUC) on subjects across different subgroups pertaining to attributes such as gender, age, racial groups, and different clinical studies and are unbiased under multiple fairness metrics such as demographic parity difference, equalized odds difference, equal opportunity difference etc. We find that models that incorporate multi-source data from demographic, clinical, genetic factors and cognitive scores are also unbiased. These models have better predictive AUC across subgroups than those trained only with imaging features but there are also situations when these additional features do not help.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源