论文标题

标准化特征分布以构建无混杂器模型的惩罚方法

A Penalty Approach for Normalizing Feature Distributions to Build Confounder-Free Models

论文作者

Vento, Anthony, Zhao, Qingyu, Paul, Robert, Pohl, Kilian M., Adeli, Ehsan

论文摘要

将机器学习算法转换为临床应用需要解决与解释性相关的挑战,例如考虑混杂变量(或元数据)的影响。混杂变量会影响输入训练数据和目标输出之间的关系。当我们在此类数据上训练模型时,混杂的变量会偏向于学习功能的分布。最近有前途的解决方案元数据归一化(MDN)估计了基于不可训练的封闭形式解决方案的元数据与每个特征之间的线性关系。但是,该估计受到小批量的样本量的限制,因此可能导致该方法在训练过程中不稳定。在本文中,我们通过应用罚款方法(称为PDMN)扩展了MDN方法。我们将问题投入到双层嵌套的优化问题中。然后,我们使用惩罚方法近似此优化问题,以便MDN层中的线性参数可训练并在所有样本上学习。这使得PMDN可以插入任何架构,甚至可以运行批处理级操作,例如变压器和经常性模型。我们在合成实验中使用PMDN与MDN相比,在模型准确性和更大的独立性方面表现出了更大的独立性,并且在合成实验中和多标签的多站点数据集(MRIS)。

Translating machine learning algorithms into clinical applications requires addressing challenges related to interpretability, such as accounting for the effect of confounding variables (or metadata). Confounding variables affect the relationship between input training data and target outputs. When we train a model on such data, confounding variables will bias the distribution of the learned features. A recent promising solution, MetaData Normalization (MDN), estimates the linear relationship between the metadata and each feature based on a non-trainable closed-form solution. However, this estimation is confined by the sample size of a mini-batch and thereby may cause the approach to be unstable during training. In this paper, we extend the MDN method by applying a Penalty approach (referred to as PDMN). We cast the problem into a bi-level nested optimization problem. We then approximate this optimization problem using a penalty method so that the linear parameters within the MDN layer are trainable and learned on all samples. This enables PMDN to be plugged into any architectures, even those unfit to run batch-level operations, such as transformers and recurrent models. We show improvement in model accuracy and greater independence from confounders using PMDN over MDN in a synthetic experiment and a multi-label, multi-site dataset of magnetic resonance images (MRIs).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源