论文标题
FedPseudo:用于联合生存分析的基于伪价值的深度学习模型
FedPseudo: Pseudo value-based Deep Learning Models for Federated Survival Analysis
论文作者
论文摘要
生存分析(事件时间分析)是医疗保健中的重要问题,因为它对患者和姑息治疗产生了广泛的影响。许多生存分析方法都认为,生存数据是从一个医疗中心或通过多中心的数据共享中心可获得的。但是,患者属性和严格的隐私法的敏感性越来越多地禁止对医疗保健数据进行分配。为了应对这一挑战,研究界研究了使用联合学习(FL)范式分散培训和模型参数共享的解决方案。在本文中,我们研究了FL在分布式医疗保健数据集上进行生存分析的利用。最近,流行的COX比例危害(CPH)模型已针对FL设置进行了调整。但是,由于其线性性和比例危害假设,CPH模型会导致次优性能,尤其是对于非线性,非IID和经过严格审查的生存数据集。为了克服现有联合生存分析方法的挑战,我们利用深度学习模型的预测准确性和伪价值的力量提出了一种称为FedPseudo的联合生存分析(FSA)的首个基于伪价值的深度学习模型。此外,我们引入了一种新的方法,该方法是在FL设置中得出伪值的伪值,从而加快了伪值的计算。关于合成和现实世界数据集的广泛实验表明,我们的基于伪有价值的FL框架的性能与最佳中心训练的深层生存分析模型相似。此外,我们提出的FL方法为各种审查设置获得了最佳结果。
Survival analysis, time-to-event analysis, is an important problem in healthcare since it has a wide-ranging impact on patients and palliative care. Many survival analysis methods have assumed that the survival data is centrally available either from one medical center or by data sharing from multi-centers. However, the sensitivity of the patient attributes and the strict privacy laws have increasingly forbidden sharing of healthcare data. To address this challenge, the research community has looked at the solution of decentralized training and sharing of model parameters using the Federated Learning (FL) paradigm. In this paper, we study the utilization of FL for performing survival analysis on distributed healthcare datasets. Recently, the popular Cox proportional hazard (CPH) models have been adapted for FL settings; however, due to its linearity and proportional hazards assumptions, CPH models result in suboptimal performance, especially for non-linear, non-iid, and heavily censored survival datasets. To overcome the challenges of existing federated survival analysis methods, we leverage the predictive accuracy of the deep learning models and the power of pseudo values to propose a first-of-its-kind, pseudo value-based deep learning model for federated survival analysis (FSA) called FedPseudo. Furthermore, we introduce a novel approach of deriving pseudo values for survival probability in the FL settings that speeds up the computation of pseudo values. Extensive experiments on synthetic and real-world datasets show that our pseudo valued-based FL framework achieves similar performance as the best centrally trained deep survival analysis model. Moreover, our proposed FL approach obtains the best results for various censoring settings.