论文标题
变分模型反演攻击
Variational Model Inversion Attacks
论文作者
论文摘要
鉴于深层神经网络的普遍性,重要的是,这些模型不会透露有关已培训的敏感数据的信息。在模型反转攻击中,恶意用户试图恢复用于训练监督神经网络的私人数据集。成功的模型反演攻击应生成现实和多样的样本,以准确描述私有数据集中的每个类别。在这项工作中,我们提供了对模型反转攻击的概率解释,并制定了一个构成多样性和准确性的变分目标。为了优化这个变异目标,我们选择了一个在深层生成模型的代码空间中定义的变异家族,该家族对公共辅助数据集进行了训练,该数据集与目标数据集具有某些结构性相似性。从经验上讲,我们的方法在目标攻击准确性,样本现实主义以及面部和胸部X射线图像数据集的多样性方面大大提高了性能。
Given the ubiquity of deep neural networks, it is important that these models do not reveal information about sensitive data that they have been trained on. In model inversion attacks, a malicious user attempts to recover the private dataset used to train a supervised neural network. A successful model inversion attack should generate realistic and diverse samples that accurately describe each of the classes in the private dataset. In this work, we provide a probabilistic interpretation of model inversion attacks, and formulate a variational objective that accounts for both diversity and accuracy. In order to optimize this variational objective, we choose a variational family defined in the code space of a deep generative model, trained on a public auxiliary dataset that shares some structural similarity with the target dataset. Empirically, our method substantially improves performance in terms of target attack accuracy, sample realism, and diversity on datasets of faces and chest X-ray images.