论文标题
使用链接的贝叶斯网络具有属性价值依赖性的选择性估算
Selectivity Estimation with Attribute Value Dependencies using Linked Bayesian Networks
论文作者
论文摘要
关系查询优化者依靠成本模型来在不同的查询执行计划之间进行选择。已知选择性估计是成本模型的关键输入。实际上,标准的选择性估计程序容易出现大错误。这主要是因为它们依赖于所谓的属性价值独立性并加入统一性假设。因此,已经提出了多维方法来捕获在关系内部和跨关系之间的两个或多个属性之间的依赖性。但是,这些方法需要大量的计算成本,这使它们在实践中无法使用。我们提出了一种基于贝叶斯网络的方法,该方法能够捕获几乎没有开销的交叉关联属性依赖性。我们的建议基于以下假设:在涉及加入时,将保留属性之间的依赖关系。此外,我们引入了一个参数,用于在估计准确性和计算成本之间进行交易。我们通过将其与来自作业和TPC-DS基准的大型工作负载进行比较,通过将其与其他相关方法进行比较来验证我们的工作。我们的结果表明,我们的方法比现有方法更有效,同时保持高度的准确性。
Relational query optimisers rely on cost models to choose between different query execution plans. Selectivity estimates are known to be a crucial input to the cost model. In practice, standard selectivity estimation procedures are prone to large errors. This is mostly because they rely on the so-called attribute value independence and join uniformity assumptions. Therefore, multidimensional methods have been proposed to capture dependencies between two or more attributes both within and across relations. However, these methods require a large computational cost which makes them unusable in practice. We propose a method based on Bayesian networks that is able to capture cross-relation attribute value dependencies with little overhead. Our proposal is based on the assumption that dependencies between attributes are preserved when joins are involved. Furthermore, we introduce a parameter for trading between estimation accuracy and computational cost. We validate our work by comparing it with other relevant methods on a large workload derived from the JOB and TPC-DS benchmarks. Our results show that our method is an order of magnitude more efficient than existing methods, whilst maintaining a high level of accuracy.