论文标题
基于矩阵和张量分解的基于无监督的EHR表型
Unsupervised EHR-based Phenotyping via Matrix and Tensor Decompositions
论文作者
论文摘要
计算表型允许无监督发现患者亚组以及电子健康记录(EHR)的相应同时发生的医疗状况。通常,EHR数据包含人口统计信息,诊断和实验室结果。发现(新颖的)表型具有预后和治疗价值的潜力。为医生提供透明且可解释的结果是一项重要的要求,也是推进精确医学的重要组成部分。低级别数据近似方法,例如矩阵(例如,非负矩阵分解)和张量分解(例如,candecomp/parafac),已经证明它们可以提供这种透明且可解释的见解。最近的发展通过合并了促进可解释性的不同约束和正规化来调整低级数据近似方法。此外,他们还为EHR数据中的共同挑战提供解决方案,例如高维度,数据稀疏性和不完整。尤其是从纵向EHR中提取时间表型,近年来引起了很多关注。在本文中,我们对基于低级别近似的计算表型方法进行了全面的综述。现有文献根据矩阵与张量分解分为时间与静态表型方法。此外,我们概述了验证表型的不同方法,即评估临床意义。
Computational phenotyping allows for unsupervised discovery of subgroups of patients as well as corresponding co-occurring medical conditions from electronic health records (EHR). Typically, EHR data contains demographic information, diagnoses and laboratory results. Discovering (novel) phenotypes has the potential to be of prognostic and therapeutic value. Providing medical practitioners with transparent and interpretable results is an important requirement and an essential part for advancing precision medicine. Low-rank data approximation methods such as matrix (e.g., non-negative matrix factorization) and tensor decompositions (e.g., CANDECOMP/PARAFAC) have demonstrated that they can provide such transparent and interpretable insights. Recent developments have adapted low-rank data approximation methods by incorporating different constraints and regularizations that facilitate interpretability further. In addition, they offer solutions for common challenges within EHR data such as high dimensionality, data sparsity and incompleteness. Especially extracting temporal phenotypes from longitudinal EHR has received much attention in recent years. In this paper, we provide a comprehensive review of low-rank approximation-based approaches for computational phenotyping. The existing literature is categorized into temporal vs. static phenotyping approaches based on matrix vs. tensor decompositions. Furthermore, we outline different approaches for the validation of phenotypes, i.e., the assessment of clinical significance.