论文标题

高维广义添加剂模型的隐私保护和无损分布式估计

Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models

论文作者

Schalk, Daniel, Bischl, Bernd, Rügamer, David

论文摘要

近年来已经开发了各种尊重个人数据分析中个人隐私的隐私框架。但是,可用的模型类(例如简单统计信息或广义线性模型)缺乏实践中基础数据生成过程良好近似所需的灵活性。在本文中,我们提出了一种使用组件梯度提升(CWB)对分布式,保护隐私和无损估算的分布式,保护隐私和无损估计的算法。使用CWB可以使我们可以使用$ L_2 $ -LOSS将GAMM估算重新构建为基础学习者的分布式拟合。为了说明不同数据位置站点的异质性,我们提出了一个行明张量产品的分布式版本,该版本允许计算位点特异性(平滑)效果。我们对CWB的适应性保留了原始算法的所有重要属性,例如无偏的特征选择以及在高维特征空间中拟合模型的可行性,并在汇总数据上产生等效的模型估计。除了两种算法的等效性的推导之外,我们还展示了算法对分布式心脏病数据集的功效,并将其与最新方法进行比较。

Various privacy-preserving frameworks that respect the individual's privacy in the analysis of data have been developed in recent years. However, available model classes such as simple statistics or generalized linear models lack the flexibility required for a good approximation of the underlying data-generating process in practice. In this paper, we propose an algorithm for a distributed, privacy-preserving, and lossless estimation of generalized additive mixed models (GAMM) using component-wise gradient boosting (CWB). Making use of CWB allows us to reframe the GAMM estimation as a distributed fitting of base learners using the $L_2$-loss. In order to account for the heterogeneity of different data location sites, we propose a distributed version of a row-wise tensor product that allows the computation of site-specific (smooth) effects. Our adaption of CWB preserves all the important properties of the original algorithm, such as an unbiased feature selection and the feasibility to fit models in high-dimensional feature spaces, and yields equivalent model estimates as CWB on pooled data. Next to a derivation of the equivalence of both algorithms, we also showcase the efficacy of our algorithm on a distributed heart disease data set and compare it with state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源