论文标题
调查数据依赖项下的会员推断攻击
Investigating Membership Inference Attacks under Data Dependencies
论文作者
论文摘要
培训对隐私敏感数据的机器学习模型已成为一种流行的实践,在不断扩展的领域推动了创新。这为可能具有严重隐私的新攻击打开了大门。一种这样的攻击,即成员推理攻击(MIA),暴露了是否使用特定数据点来训练模型。越来越多的文献使用不同的私人(DP)培训算法来防御此类攻击。但是,这些作品在限制性假设下评估了辩护,即训练集的所有成员以及非成员都是独立且分布相同的。在文献中,许多现实世界中的用例不存在此假设。在此激励的情况下,我们评估了样本之间使用统计依赖性的会员推断,并解释了为什么DP在此更一般的情况下不提供有意义的保护(具有培训集$ n $的隐私参数$ε$ scales)。我们使用由现实世界中的数据构建的训练集对样本之间的不同类型的依赖性进行了一系列经验评估。我们的结果表明,训练集依赖性可以严重提高MIA的性能,因此假设数据样本在统计上是独立的,可以大大低估MIA的性能。
Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Private (DP) training algorithms as a defence against such attacks. However, these works evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed. This assumption does not hold for many real-world use cases in the literature. Motivated by this, we evaluate membership inference with statistical dependencies among samples and explain why DP does not provide meaningful protection (the privacy parameter $ε$ scales with the training set size $n$) in this more general case. We conduct a series of empirical evaluations with off-the-shelf MIAs using training sets built from real-world data showing different types of dependencies among samples. Our results reveal that training set dependencies can severely increase the performance of MIAs, and therefore assuming that data samples are statistically independent can significantly underestimate the performance of MIAs.