COX模型的分布式学习优化可以泄漏患者数据：风险和解决方案

论文标题

COX模型的分布式学习优化可以泄漏患者数据：风险和解决方案

Distributed learning optimisation of Cox models can leak patient data: Risks and solutions

论文作者

Brink, Carsten, Hansen, Christian Rønn, Field, Matthew, Price, Gareth, Thwaites, David, Sarup, Nis, Bernchou, Uffe, Holloway, Lois

论文摘要

医疗数据通常非常敏感，并且经常缺少数据。由于数据的敏感性，因此有兴趣创建建模方法，其中将数据保存在每个本地中心以保护其隐私，但是可以对模型进行培训并从多个中心的数据中学习。这种方法可能是分布式的机器学习（联合学习，协作学习），其中模型是根据每个中心的汇总本地模型信息迭代计算的。但是，即使没有特定数据离开中心，也存在可能的风险，即交换的信息足以重建全部或部分患者数据，这会妨碍分布式学习的安全保护理由的想法。本文表明，COX生存模型的优化可以导致患者数据泄漏。之后，我们建议一种优化和验证COX模型的方法，该模型以安全的方式避免了这些问题。建议方法的可行性在提供的MATLAB代码中证明，该代码还包括处理丢失数据的方法。

Medical data are often highly sensitive, and frequently there are missing data. Due to the data's sensitive nature, there is an interest in creating modelling methods where the data are kept in each local centre to preserve their privacy, but yet the model can be trained on and learn from data across multiple centres. Such an approach might be distributed machine learning (federated learning, collaborative learning) in which a model is iteratively calculated based on aggregated local model information from each centre. However, even though no specific data are leaving the centre, there is a potential risk that the exchanged information is sufficient to reconstruct all or part of the patient data, which would hamper the safety-protecting rationale idea of distributed learning. This paper demonstrates that the optimisation of a Cox survival model can lead to patient data leakage. Following this, we suggest a way to optimise and validate a Cox model that avoids these problems in a secure way. The feasibility of the suggested method is demonstrated in a provided Matlab code that also includes methods for handling missing data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题