AD-BERT：使用预训练的情境化嵌入来预测从轻度认知障碍到阿尔茨海默氏病的发展

论文标题

AD-BERT：使用预训练的情境化嵌入来预测从轻度认知障碍到阿尔茨海默氏病的发展

AD-BERT: Using Pre-trained contextualized embeddings to Predict the Progression from Mild Cognitive Impairment to Alzheimer's Disease

论文作者

Mao, Chengsheng, Xu, Jie, Rasmussen, Luke, Li, Yikuan, Adekkanattu, Prakash, Pacheco, Jennifer, Bonakdarpour, Borna, Vassar, Robert, Jiang, Guoqian, Wang, Fei, Pathak, Jyotishman, Luo, Yuan

论文摘要

目的：我们使用来自电子健康记录（EHRS）的非结构化临床笔记的预训练的双向编码器（BERT）模型开发了一个深度学习框架，以预测从轻度认知障碍（MCI）到阿尔茨海默氏病（AD）的疾病进展风险。材料和方法：我们在2000 - 2020年之间确定了3657名诊断为MCI的患者以及西北医学企业数据仓库（NMEDW）的进度注释。该进度注释不迟于第一个MCI诊断用于预测。我们首先通过去识别，清洁和分裂来预处理注释，然后根据预处理的注释上的公开可用的Bio+临床BERT预先介绍AD（AD-BERT）的BERT模型。通过MaxPooling将患者注释处理的所有部分的嵌入组合在一起，以计算MCI至AD进展的概率。为了复制，我们对在同一时间范围内在Weill Cornell Medicine（WCM）上确定的2563名MCI患者进行了类似的实验。结果：与7个基线模型相比，AD-BERT模型在两个数据集上都达到了最佳性能，在接收器操作特征曲线（AUC）下为0.8170，NMEDW数据集的F1得分为0.4178，AUC为0.8830，而AUC为0.8830，F1分数为0.8830，F1得分为0.6836。结论：我们使用BERT模型开发了一个深度学习框架，该框架为使用临床注释分析提供了有效的解决方案来预测MCI至AD的进展。

Objective: We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). Materials and Methods: We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000-2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting, and then pretrained a BERT model for AD (AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. The embeddings of all the sections of a patient's notes processed by AD-BERT were combined by MaxPooling to compute the probability of MCI-to-AD progression. For replication, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. Results: Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.8170 and F1 score of 0.4178 on NMEDW dataset and AUC of 0.8830 and F1 score of 0.6836 on WCM dataset. Conclusion: We developed a deep learning framework using BERT models which provide an effective solution for prediction of MCI-to-AD progression using clinical note analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题