论文标题

忽略:大数据差异私人探索性可视化

Overlook: Differentially Private Exploratory Visualization for Big Data

论文作者

Thaker, Pratiksha, Budiu, Mihai, Gopalan, Parikshit, Wieder, Udi, Zaharia, Matei

论文摘要

提供差异隐私的数据探索系统必须管理一个隐私预算,以衡量多个查询中丢失的隐私量。管理隐私预算的一种有效策略是计算数据的一次性私人摘要,用户可以对此进行无限数量的查询。但是,使用摘要的现有系统是为脱机用例构建的,其中一组查询是提前知道的,并且该系统仔细优化了摘要。这些系统构建的摘要的计算成本很高,并且存储也可能是昂贵的。 我们介绍了Overlook,该系统可以为数据分析师和数据策展人提供互动潜伏期的私人数据探索。 Overlook的关键想法是一个虚拟摘要,可以逐步评估,而无需额外的空间存储或昂贵的预抄件。仅使用现有引擎(例如SQL DBMS)执行查询,并为其结果添加噪音。由于Overlook的概要不需要昂贵的预先计算或存储,因此数据策展人还可以使用Overlook进行交互探索隐私参数的影响。 Overlook根据开源Hillview系统提供了丰富的视觉查询界面。 Overlook可以实现与现有基于摘要的系统相媲美的精度,同时提供更好的性能并消除需要额外存储的需求。

Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited number of queries. However, existing systems using synopses are built for offline use cases, where a set of queries is known ahead of time and the system carefully optimizes a synopsis for it. The synopses that these systems build are costly to compute and may also be costly to store. We introduce Overlook, a system that enables private data exploration at interactive latencies for both data analysts and data curators. The key idea in Overlook is a virtual synopsis that can be evaluated incrementally, without extra space storage or expensive precomputation. Overlook simply executes queries using an existing engine, such as a SQL DBMS, and adds noise to their results. Because Overlook's synopses do not require costly precomputation or storage, data curators can also use Overlook to explore the impact of privacy parameters interactively. Overlook offers a rich visual query interface based on the open source Hillview system. Overlook achieves accuracy comparable to existing synopsis-based systems, while offering better performance and removing the need for extra storage.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源