论文标题
密度感知的个性化培训,以预测不平衡的医疗数据
Density-Aware Personalized Training for Risk Prediction in Imbalanced Medical Data
论文作者
论文摘要
由于大多数入院的患者生存,因此感兴趣的医疗事件(例如死亡率)通常以低率发生。具有这种不平衡率(类密度差异)的训练模型可能会导致次优预测。传统上,这个问题是通过临时方法(例如重新采样或重新恢复)来解决的,但在许多情况下的性能仍然有限。我们为此不平衡问题提出了一个培训模型的框架:1)首先将功能提取和分类过程解除,分别调整每个组件的训练批次,以减轻由类密度差异引起的偏差; 2)我们以密度感知损失和可学习的成本矩阵进行错误分类训练网络。我们证明了模型在现实世界医学数据集(TopCat和Mimic-III)中的改进性能,以显示与域中的基线相比,AUC-ROC,AUC-PRC,Brier技能得分的改进。
Medical events of interest, such as mortality, often happen at a low rate in electronic medical records, as most admitted patients survive. Training models with this imbalance rate (class density discrepancy) may lead to suboptimal prediction. Traditionally this problem is addressed through ad-hoc methods such as resampling or reweighting but performance in many cases is still limited. We propose a framework for training models for this imbalance issue: 1) we first decouple the feature extraction and classification process, adjusting training batches separately for each component to mitigate bias caused by class density discrepancy; 2) we train the network with both a density-aware loss and a learnable cost matrix for misclassifications. We demonstrate our model's improved performance in real-world medical datasets (TOPCAT and MIMIC-III) to show improved AUC-ROC, AUC-PRC, Brier Skill Score compared with the baselines in the domain.