论文标题
使用重症监护中的电子健康记录的Covid-19预测建模的全面基准
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care
论文作者
论文摘要
COVID-19大流行对全球医疗保健系统造成了沉重的负担,并造成了巨大的社会破坏和经济损失。已经提出了许多深度学习模型来执行临床预测任务,例如使用电子健康记录(EHR)数据对重症监护病房中的Covid-19患者的死亡率预测。尽管在某些临床应用中取得了最初的成功,但目前缺乏基准结果来获得公平的比较,因此我们可以选择最佳的临床使用模型。此外,传统预测任务的制定与重症监护现实世界的临床实践之间存在差异。为了填补这些空白,我们提出了两项临床预测任务,特定于结局的预测和重症监护病房的Covid-19患者的早期死亡率预测。这两个任务改编自幼稚的停车时间和死亡率预测任务,以适应Covid-19患者的临床实践。我们建议对两个任务,包括5个机器学习模型,6种基本的深度学习模型和6种专门设计的专门设计的深度学习预测模型,提出公平,详细,开源数据预处管道,并评估17个最先进的预测模型。我们使用来自两个现实世界Covid-19 EHR数据集的数据提供基准测试结果。一个数据集可公开可用,而无需任何查询,并且可以根据要求访问另一个数据集。我们为两项任务提供公平的,可重复的基准测试结果。我们在在线平台上部署所有实验结果和模型。我们还允许临床医生和研究人员将数据上传到平台,并使用训练有素的模型快速预测结果。我们希望我们的努力能够进一步促进COVID-19的深度学习和机器学习研究。
The COVID-19 pandemic has posed a heavy burden to the healthcare system worldwide and caused huge social disruption and economic loss. Many deep learning models have been proposed to conduct clinical predictive tasks such as mortality prediction for COVID-19 patients in intensive care units using Electronic Health Record (EHR) data. Despite their initial success in certain clinical applications, there is currently a lack of benchmarking results to achieve a fair comparison so that we can select the optimal model for clinical use. Furthermore, there is a discrepancy between the formulation of traditional prediction tasks and real-world clinical practice in intensive care. To fill these gaps, we propose two clinical prediction tasks, Outcome-specific length-of-stay prediction and Early mortality prediction for COVID-19 patients in intensive care units. The two tasks are adapted from the naive length-of-stay and mortality prediction tasks to accommodate the clinical practice for COVID-19 patients. We propose fair, detailed, open-source data-preprocessing pipelines and evaluate 17 state-of-the-art predictive models on two tasks, including 5 machine learning models, 6 basic deep learning models and 6 deep learning predictive models specifically designed for EHR data. We provide benchmarking results using data from two real-world COVID-19 EHR datasets. One dataset is publicly available without needing any inquiry and another dataset can be accessed on request. We provide fair, reproducible benchmarking results for two tasks. We deploy all experiment results and models on an online platform. We also allow clinicians and researchers to upload their data to the platform and get quick prediction results using our trained models. We hope our efforts can further facilitate deep learning and machine learning research for COVID-19 predictive modeling.