论文标题
在差异隐私中,有真理:合奏私人学习中的投票泄漏
In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning
论文作者
论文摘要
从敏感数据中学习时,必须注意确保培训算法解决隐私问题。教师合奏或PATE的规范私人聚合通过通过投票机制汇总(可能分布的)教师模型集合的预测来计算输出标签。该机制增加了噪音,以获得有关教师培训数据的差异隐私保证。在这项工作中,我们观察到这种噪声的使用(使PATE预测随机)可以实现敏感信息的新形式。对于给定的输入,我们的对手利用这种随机性来提取基础教师提交的投票的高保真直方图。从这些直方图中,对手可以学习投入的敏感属性,例如种族,性别或年龄。尽管这次攻击并没有直接侵犯差异隐私保证,但它显然违反了隐私规范和期望,如果没有插入差异隐私的噪音,就根本不可能。实际上,违反直觉,随着我们添加更多噪音以提供更强的差异隐私,攻击变得更加容易。我们希望这鼓励未来的工作从整体上考虑隐私,而不是将差异隐私视为灵丹妙药。
When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a (possibly distributed) collection of teacher models via a voting mechanism. The mechanism adds noise to attain a differential privacy guarantee with respect to the teachers' training data. In this work, we observe that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers. From these histograms, the adversary can learn sensitive attributes of the input such as race, gender, or age. Although this attack does not directly violate the differential privacy guarantee, it clearly violates privacy norms and expectations, and would not be possible at all without the noise inserted to obtain differential privacy. In fact, counter-intuitively, the attack becomes easier as we add more noise to provide stronger differential privacy. We hope this encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.