论文标题
私人:通过数据生成保护序列的隐私
PrivGen: Preserving Privacy of Sequences Through Data Generation
论文作者
论文摘要
顺序数据无处不在,它可以作为研究的基础,从而改善过程。例如,可以通过识别GPS数据中的瓶颈来改善道路基础设施,或者可以通过分析医疗数据中疾病进展的模式来改善早期诊断。主要的障碍是,由于担心侵犯用户隐私,访问和使用通常受到限制或根本不允许。匿名序列数据并不是一个简单的任务,因为用户会随着时间的推移创建几乎独特的签名。现有的匿名方法降低了信息的质量,以维持所需的匿名水平。对质量的损害可能会破坏原始数据中出现的模式并损害保存各种特征。由于在许多情况下,研究人员不需要数据,而仅对数据中存在的模式感兴趣,因此我们提出了Privgen,这是一种创新的方法,用于生成维护源数据模式和特征的数据。我们证明,数据生成机制显着限制了隐私侵犯的风险。使用现实世界数据集评估我们的方法表明,其生成的数据保留了基于源数据训练的数据的许多特征,包括顺序模型。这表明,我们方法生成的数据可以代替实际数据,用于各种类型的分析,同时维护用户隐私和数据的完整性。
Sequential data is everywhere, and it can serve as a basis for research that will lead to improved processes. For example, road infrastructure can be improved by identifying bottlenecks in GPS data, or early diagnosis can be improved by analyzing patterns of disease progression in medical data. The main obstacle is that access and use of such data is usually limited or not permitted at all due to concerns about violating user privacy, and rightly so. Anonymizing sequence data is not a simple task, since a user creates an almost unique signature over time. Existing anonymization methods reduce the quality of information in order to maintain the level of anonymity required. Damage to quality may disrupt patterns that appear in the original data and impair the preservation of various characteristics. Since in many cases the researcher does not need the data as is and instead is only interested in the patterns that exist in the data, we propose PrivGen, an innovative method for generating data that maintains patterns and characteristics of the source data. We demonstrate that the data generation mechanism significantly limits the risk of privacy infringement. Evaluating our method with real-world datasets shows that its generated data preserves many characteristics of the data, including the sequential model, as trained based on the source data. This suggests that the data generated by our method could be used in place of actual data for various types of analysis, maintaining user privacy and the data's integrity at the same time.