论文标题
通过基于GAN的数据合成的联合聚类
Federated clustering with GAN-based data synthesis
论文作者
论文摘要
联合聚类(FC)是联合设置中集中聚类的扩展。这里的关键是如何在不共享私人数据的情况下构建全局相似性度量,因为本地相似性可能不足以正确分组本地数据,并且由于隐私限制,无法直接测量客户端样本的相似性。显然,分析FC的最直接方法是采用从集中式的方法(例如K-Means(KM)和模糊C-Means(FCM))扩展的方法。但是,它们容易受到客户之间非独立分布(非IID)数据的影响。为了处理这一点,我们提出了一个新的联合聚类框架,名为合成数据辅助联合聚类(SDA-FC)。它在每个客户端本地训练生成的对抗网络,并将生成的合成数据上传到服务器,在该服务器上,在合成数据上执行KM或FCM。合成数据可以使模型免疫非IID问题,并使我们能够在不共享私人数据的情况下更有效地捕获全球相似性特征。全面的实验揭示了SDA-FC的优势,包括在解决非IID问题和设备故障方面的出色性能。
Federated clustering (FC) is an extension of centralized clustering in federated settings. The key here is how to construct a global similarity measure without sharing private data, since the local similarity may be insufficient to group local data correctly and the similarity of samples across clients cannot be directly measured due to privacy constraints. Obviously, the most straightforward way to analyze FC is to employ the methods extended from centralized ones, such as K-means (KM) and fuzzy c-means (FCM). However, they are vulnerable to non independent-and-identically-distributed (non-IID) data among clients. To handle this, we propose a new federated clustering framework, named synthetic data aided federated clustering (SDA-FC). It trains generative adversarial network locally in each client and uploads the generated synthetic data to the server, where KM or FCM is performed on the synthetic data. The synthetic data can make the model immune to the non-IID problem and enable us to capture the global similarity characteristics more effectively without sharing private data. Comprehensive experiments reveals the advantages of SDA-FC, including superior performance in addressing the non-IID problem and the device failures.