论文标题
有条件的杜鹃过滤器
Conditional Cuckoo Filters
论文作者
论文摘要
Bloom过滤器,杜鹃过滤器和其他近似设置的成员草图具有广泛的应用。通常,如果项目不在数据集中,则可以跳过昂贵的操作。这些过滤器提供了一种廉价的,有效的记忆方法来测试项目是否在集合中并避免不必要的操作。现有的草图仅允许对单组的成员测试。但是,在某些应用程序(例如加入处理)中,相关集不是固定的,并且由一组谓词确定。 我们提出了条件杜鹃滤波器,这是对杜鹃滤波器的简单修改,允许在预先计算的草图上设置给定谓词的会员资格测试。该过滤器还引入了一种新型的链式技术,该技术使杜鹃过滤器可以处理重复键的插入。我们在加入处理应用程序上评估了我们的方法,并表明它们会大大减少加入必须处理的元素数量。
Bloom filters, cuckoo filters, and other approximate set membership sketches have a wide range of applications. Oftentimes, expensive operations can be skipped if an item is not in a data set. These filters provide an inexpensive, memory efficient way to test if an item is in a set and avoid unnecessary operations. Existing sketches only allow membership testing for single set. However, in some applications such as join processing, the relevant set is not fixed and is determined by a set of predicates. We propose the Conditional Cuckoo Filter, a simple modification of the cuckoo filter that allows for set membership testing given predicates on a pre-computed sketch. This filter also introduces a novel chaining technique that enables cuckoo filters to handle insertion of duplicate keys. We evaluate our methods on a join processing application and show that they significantly reduce the number of tuples that a join must process.