分散临床机器学习中非IID问题的生成数据增强

论文标题

分散临床机器学习中非IID问题的生成数据增强

Generative Data Augmentation for Non-IID Problem in Decentralized Clinical Machine Learning

论文作者

Wang, Zirui, Duan, Shaoming, Wu, Chengyue, Lin, Wenhao, Zha, Xinyu, Han, Peiyi, Liu, Chuanyi

论文摘要

Swarm Learning（SL）是一种有希望的分散机器学习范式，在临床应用中获得了高性能。 SL通过结合边缘计算和基于区块链的对等网络来解决联合学习中的中心结构的问题。尽管在参与者之间的独立和相同分布（IID）数据的假设中有很有希望的结果，但随着非IID数据的程度的增加，SL遭受了性能降解。为了解决这个问题，我们提出了一个称为SL-GAN的群中的生成增强框架，该框架通过从参与者那里生成合成数据来增加非IID数据。 SL-GAN在本地训练发电机和鉴别器，并通过SL网络中的随机选举协调员定期聚集。在标准假设下，我们从理论上证明了使用随机近似值证明SL-GAN的收敛性。实验结果表明，SL-GAN在三个现实世界临床数据集（包括结核病，白血病，Covid-19）上胜过最先进的方法。

Swarm learning (SL) is an emerging promising decentralized machine learning paradigm and has achieved high performance in clinical applications. SL solves the problem of a central structure in federated learning by combining edge computing and blockchain-based peer-to-peer network. While there are promising results in the assumption of the independent and identically distributed (IID) data across participants, SL suffers from performance degradation as the degree of the non-IID data increases. To address this problem, we propose a generative augmentation framework in swarm learning called SL-GAN, which augments the non-IID data by generating the synthetic data from participants. SL-GAN trains generators and discriminators locally, and periodically aggregation via a randomly elected coordinator in SL network. Under the standard assumptions, we theoretically prove the convergence of SL-GAN using stochastic approximations. Experimental results demonstrate that SL-GAN outperforms state-of-art methods on three real world clinical datasets including Tuberculosis, Leukemia, COVID-19.

下载PDF全文

下载文献需遵守相关版权规定

论文标题