论文标题
实例最佳联接尺寸估计
Instance Optimal Join Size Estimation
论文作者
论文摘要
我们考虑了从实例最佳分析的角度来考虑有效估计预处理关系表的内部连接大小的问题。实例最佳算法的运行时间与验证解决方案的正确性所需的最小时间相当。以前的实例最佳算法仅在连接大小很小时才知道(作为其运行时间的一个组成部分,其在联接大小中是线性的)。我们提供了一种实例最佳算法,用于估算所有实例的联接大小,包括何时连接大小较大,通过删除对联接大小的依赖性。作为副产品,我们展示了如何在相当的时间内随机均匀地从联接中采样行。
We consider the problem of efficiently estimating the size of the inner join of a collection of preprocessed relational tables from the perspective of instance optimality analysis. The run time of instance optimal algorithms is comparable to the minimum time needed to verify the correctness of a solution. Previously instance optimal algorithms were only known when the size of the join was small (as one component of their run time that was linear in the join size). We give an instance optimal algorithm for estimating the join size for all instances, including when the join size is large, by removing the dependency on the join size. As a byproduct, we show how to sample rows from the join uniformly at random in a comparable amount of time.