论文标题
组成数据的差异私人方法
Differentially Private Methods for Compositional Data
论文作者
论文摘要
机密数据,例如电子健康记录,可穿戴设备的活动数据以及地理位置数据,越来越普遍。差异隐私提供了一个框架来进行统计分析,同时减轻泄漏私人信息的风险。组成数据由载体组成的载体组成,这些载体加起来是恒定的,但在差异隐私文献中很少关注。本文提出了使用Dirichlet分布分析组成数据的差异私人方法。我们探索几种方法,包括贝叶斯和自举程序。对于贝叶斯方法,我们考虑基于马尔可夫链蒙特卡洛,近似贝叶斯计算和渐近近似的后推理技术。我们进行了广泛的模拟研究,以比较这些方法并提出基于证据的建议。最后,我们将方法应用于美国时间使用调查中的数据集。
Confidential data, such as electronic health records, activity data from wearable devices, and geolocation data, are becoming increasingly prevalent. Differential privacy provides a framework to conduct statistical analyses while mitigating the risk of leaking private information. Compositional data, which consist of vectors with positive components that add up to a constant, have received little attention in the differential privacy literature. This article proposes differentially private approaches for analyzing compositional data using the Dirichlet distribution. We explore several methods, including Bayesian and bootstrap procedures. For the Bayesian methods, we consider posterior inference techniques based on Markov Chain Monte Carlo, Approximate Bayesian Computation, and asymptotic approximations. We conduct an extensive simulation study to compare these approaches and make evidence-based recommendations. Finally, we apply the methodology to a data set from the American Time Use Survey.