论文标题
班级小丑:在企业量表上学习的数据修订
Class Clown: Data Redaction in Machine Unlearning at Enterprise Scale
论文作者
论文摘要
个人正在通过最新的数据隐私法来获得更多的个人数据控制,例如一般数据保护法规和《加利福尼亚州消费者隐私法》。这些法律的一个方面是能够要求企业删除私人信息,即所谓的“被遗忘的权利”或“删除权”。这些法律对使用这些有价值的消费者数据集培训大型,高度准确的深层神经网络(DNN)的公司和组织具有严重的财务影响。但是,收到的修订请求对如何在履行核心业务运营的同时遵守法律提出了复杂的技术挑战。我们介绍了DNN模型生命周期维护过程,该过程建立了如何处理特定数据修订请求并最大程度地减少对模型进行重新训练的需求。我们的过程基于会员推理攻击作为培训集中每个点的合规工具。这些攻击模型量化了所有培训数据点的隐私风险,并构成了准确部署模型的后续数据修复的基础;切除是通过增量模型更新中的错误标签分配实现的。
Individuals are gaining more control of their personal data through recent data privacy laws such the General Data Protection Regulation and the California Consumer Privacy Act. One aspect of these laws is the ability to request a business to delete private information, the so called "right to be forgotten" or "right to erasure". These laws have serious financial implications for companies and organizations that train large, highly accurate deep neural networks (DNNs) using these valuable consumer data sets. However, a received redaction request poses complex technical challenges on how to comply with the law while fulfilling core business operations. We introduce a DNN model lifecycle maintenance process that establishes how to handle specific data redaction requests and minimize the need to completely retrain the model. Our process is based upon the membership inference attack as a compliance tool for every point in the training set. These attack models quantify the privacy risk of all training data points and form the basis of follow-on data redaction from an accurate deployed model; excision is implemented through incorrect label assignment within incremental model updates.