可扩展的工作流程，用于与临床医生在环境中构建机器学习分类器，以识别特定疾病的患者

论文标题

可扩展的工作流程，用于与临床医生在环境中构建机器学习分类器，以识别特定疾病的患者

A Scalable Workflow to Build Machine Learning Classifiers with Clinician-in-the-Loop to Identify Patients in Specific Diseases

论文作者

Zhang, Jingqing, Sharma, Atri, Bolanos, Luis, Li, Tong, Tanwar, Ashwani, Gupta, Vibhor, Guo, Yike

论文摘要

临床医生可以依靠医学编码系统，例如国际疾病分类（ICD）来鉴定电子健康记录（EHR）中的疾病患者。但是，由于缺乏细节和特异性以及错误编码的概率，最近的研究表明，ICD代码通常无法在实际临床实践中准确地表征患者的特定疾病，因此，使用它们来寻找研究或试验的患者可能导致高失败率，并在未编码的患者身上缺失。对所有大规模检查所有患者的手动检查是不可行的，因为它的成本高昂和缓慢。本文提出了一个可扩展的工作流程，该工作流程利用EHR的结构化数据和非结构化的文本注释，包括NLP，Automl和临床医生在循环机制中，以构建机器学习分类器，以识别具有给定疾病的患者，尤其是那些目前可能被ICD代码误编码或错过的患者。进行了模拟III数据集中的案例研究，与简单地在黄金测试子集中使用ICD代码相比，提出的工作流程表现出更高的分类性能，以鉴定卵巢癌患者（0.901 vs 0.814），肺癌（0.859 vs 0.828），癌症cachexia（0.859），0.89595950.65550650650650（0.6550），lup nither andl andr nionphr。 vs 0.855）。此外，提议的工作流程利用非结构化注释始终优于仅使用结构化数据的基线，该基线仅使用F1增加（卵巢癌0.901 vs 0.719，肺癌0.859，0.859 vs 0.787，CACHEXIA CACHEXIA 0.862 vs 0.862 vs 0.838 vs 0.838 and Lupus Nephritis 0.959 vs 0.959 VS VS VS VS VS VS）。大型测试集中的实验还表明，拟议的工作流程可以找到更多被ICD代码编码或错过的患者。此外，还进行了可解释性研究，以在临床上验证分类器的最佳影响特征。

Clinicians may rely on medical coding systems such as International Classification of Diseases (ICD) to identify patients with diseases from Electronic Health Records (EHRs). However, due to the lack of detail and specificity as well as a probability of miscoding, recent studies suggest the ICD codes often cannot characterise patients accurately for specific diseases in real clinical practice, and as a result, using them to find patients for studies or trials can result in high failure rates and missing out on uncoded patients. Manual inspection of all patients at scale is not feasible as it is highly costly and slow. This paper proposes a scalable workflow which leverages both structured data and unstructured textual notes from EHRs with techniques including NLP, AutoML and Clinician-in-the-Loop mechanism to build machine learning classifiers to identify patients at scale with given diseases, especially those who might currently be miscoded or missed by ICD codes. Case studies in the MIMIC-III dataset were conducted where the proposed workflow demonstrates a higher classification performance in terms of F1 scores compared to simply using ICD codes on gold testing subset to identify patients with Ovarian Cancer (0.901 vs 0.814), Lung Cancer (0.859 vs 0.828), Cancer Cachexia (0.862 vs 0.650), and Lupus Nephritis (0.959 vs 0.855). Also, the proposed workflow that leverages unstructured notes consistently outperforms the baseline that uses structured data only with an increase of F1 (Ovarian Cancer 0.901 vs 0.719, Lung Cancer 0.859 vs 0.787, Cancer Cachexia 0.862 vs 0.838 and Lupus Nephritis 0.959 vs 0.785). Experiments on the large testing set also demonstrate the proposed workflow can find more patients who are miscoded or missed by ICD codes. Moreover, interpretability studies are also conducted to clinically validate the top impact features of the classifiers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题