语义分解改善了在EHR数据上的大型语言模型的学习

论文标题

语义分解改善了在EHR数据上的大型语言模型的学习

Semantic Decomposition Improves Learning of Large Language Models on EHR Data

论文作者

Bloore, David A., Gauriau, Romane, Decker, Anna L., Oppenheim, Jacob

论文摘要

人们普遍认为电子健康记录（EHR）具有大量可操作的见解，并以不规则的半结构格式加密，在大声的噪声背景中。为了简化健康和疾病的学习模式，可以将EHR中的医疗法规分解为通过层次图连接的语义单元。在从变形金刚（BERT）和图形注意网络（GAT）的双向编码器表示之间的早期协同作用的基础上，我们提出了H-Bert，它摄入了层次医学代码的完整图树扩展，而不是仅摄入叶子并将患者级别的标签推向每次访问。该方法可显着改善500多种医学诊断类中患者成员的预测，该类别通过汇总的AUC和AP衡量，并在密切相关但临床上不同的表型中创建了患者的不同表示。

Electronic health records (EHR) are widely believed to hold a profusion of actionable insights, encrypted in an irregular, semi-structured format, amidst a loud noise background. To simplify learning patterns of health and disease, medical codes in EHR can be decomposed into semantic units connected by hierarchical graphs. Building on earlier synergy between Bidirectional Encoder Representations from Transformers (BERT) and Graph Attention Networks (GAT), we present H-BERT, which ingests complete graph tree expansions of hierarchical medical codes as opposed to only ingesting the leaves and pushes patient-level labels down to each visit. This methodology significantly improves prediction of patient membership in over 500 medical diagnosis classes as measured by aggregated AUC and APS, and creates distinct representations of patients in closely related but clinically distinct phenotypes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题