私有垂直联合群集

论文标题

私有垂直联合群集

Differentially Private Vertical Federated Clustering

论文作者

Li, Zitao, Wang, Tianhao, Li, Ninghui

论文摘要

在许多应用程序中，多方都有有关相同用户集但属性集的私人数据，并且服务器希望利用数据来训练模型。为了在保护数据主体的隐私时启用模型学习，我们需要垂直联合学习（VFL）技术，其中数据派对仅共享用于培训模型的信息，而不是私人数据。但是，确保共享信息在学习准确的模型的同时保持隐私是一项挑战。据我们所知，本文提出的算法是第一个实用的解决方案，用于差异化垂直联合K-Means聚类，服务器可以在其中获得具有可证明的差分隐私保证的全球中心。我们的算法假定一个不信任的中央服务器，该服务器汇总了本地数据派对的不同私有本地中心和成员编码。它根据收到的信息构建加权网格作为全局数据集的概要。最终中心是通过在加权网格上运行任何K-均值算法而产生的。我们的网格重量估计方法采用了基于Flajolet-Martin草图的新颖，轻巧和差异性私有的相交算法。为了提高两个以上数据派对的设置中的估计精度，我们进一步提出了权重估计算法的精致版本和参数调整策略，以减少最终的K-均值实用程序，以便在中央私人环境中接近它。我们为我们的算法计算的群集中心提供了理论实用性分析和实验评估结果，并表明我们的方法在理论上和经验上都比基于现有技术的两个基线在理论上和经验上的表现更好。

In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes, and a server wants to leverage the data to train a model. To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques, where the data parties share only information for training the model, instead of the private data. However, it is challenging to ensure that the shared information maintains privacy while learning accurate models. To the best of our knowledge, the algorithm proposed in this paper is the first practical solution for differentially private vertical federated k-means clustering, where the server can obtain a set of global centers with a provable differential privacy guarantee. Our algorithm assumes an untrusted central server that aggregates differentially private local centers and membership encodings from local data parties. It builds a weighted grid as the synopsis of the global dataset based on the received information. Final centers are generated by running any k-means algorithm on the weighted grid. Our approach for grid weight estimation uses a novel, light-weight, and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin sketch. To improve the estimation accuracy in the setting with more than two data parties, we further propose a refined version of the weights estimation algorithm and a parameter tuning strategy to reduce the final k-means utility to be close to that in the central private setting. We provide theoretical utility analysis and experimental evaluation results for the cluster centers computed by our algorithm and show that our approach performs better both theoretically and empirically than the two baselines based on existing techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题