论文标题
prefair:私下生成合理的公平合成数据
PreFair: Privately Generating Justifiably Fair Synthetic Data
论文作者
论文摘要
当数据库受差异隐私(DP)保护时,其可用性在范围上受到限制。在这种情况下,生成模拟私有数据属性的数据的合成版本,允许用户在合成数据上执行任何操作,同时保持原始数据的隐私。因此,多项工作专门为DP合成数据生成的系统设计。但是,此类系统可以保留甚至放大使其不公平的数据的属性,从而使合成数据不适合使用。在这项工作中,我们提出了Prefair,该系统允许DP公平合成数据生成。 Prefair通过合并确保公平合成数据的因果公平标准来扩展最先进的DP数据生成机制。我们调整了合理公平的概念,以适合合成数据生成方案。我们进一步研究了生成DP公平合成数据的问题,显示了它的棘手性和设计算法在某些假设下是最佳的。我们还提供了广泛的实验评估,表明Prefair生成的合成数据比领先的DP数据生成机制生成的数据明显公平,同时仍然忠于私人数据。
When a database is protected by Differential Privacy (DP), its usability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the original data. Therefore, multiple works have been devoted to devising systems for DP synthetic data generation. However, such systems may preserve or even magnify properties of the data that make it unfair, endering the synthetic data unfit for use. In this work, we present PreFair, a system that allows for DP fair synthetic data generation. PreFair extends the state-of-the-art DP data generation mechanisms by incorporating a causal fairness criterion that ensures fair synthetic data. We adapt the notion of justifiable fairness to fit the synthetic data generation scenario. We further study the problem of generating DP fair synthetic data, showing its intractability and designing algorithms that are optimal under certain assumptions. We also provide an extensive experimental evaluation, showing that PreFair generates synthetic data that is significantly fairer than the data generated by leading DP data generation mechanisms, while remaining faithful to the private data.