论文标题
SafeBound:一种用于生成基数界限的实用系统
SafeBound: A Practical System for Generating Cardinality Bounds
论文作者
论文摘要
最近的工作重点强调了基数估计值对查询优化的重要性。尽管随着时间的流逝,新技术的准确性不断提高,但它们通常仍然允许低估,这通常会导致优化者做出过度乐观的决策。对于昂贵的查询来说,这可能是非常昂贵的。估计的另一种方法是基数界限,也称为悲观的基数估计,其中基数估计器提供了真正的基数的保证上限。通过永不低估,这种方法允许优化器避免潜在的效率低下计划。但是,现有的悲观基数估计量尚不实际:它们对数据使用非常有限的统计数据,无法处理谓词。在本文中,我们介绍了SafeBound,这是第一个生成基数界限的实用系统。 SafeBound建立在最新的理论工作的基础上,该工作使用与联接属性的度序列使用度序列来计算基数界限,用谓词扩展了此框架,并为程度序列引入了一种实用的压缩方法,并实现了有效的推理算法。在四个工作负载中,SafeBound的端到端运行时间比PostgreSQL低高出80%,并且通过改善昂贵查询的运行时,与基于ML的估计器和悲观的基于ML的估计量和悲观的心脏估计量相比,安全性高达80%。它还节省了高达500倍的查询计划时间,与最先进的基数估计方法相比,最多消耗的空间减少了6.8倍。
Recent work has reemphasized the importance of cardinality estimates for query optimization. While new techniques have continuously improved in accuracy over time, they still generally allow for under-estimates which often lead optimizers to make overly optimistic decisions. This can be very costly for expensive queries. An alternative approach to estimation is cardinality bounding, also called pessimistic cardinality estimation, where the cardinality estimator provides guaranteed upper bounds of the true cardinality. By never underestimating, this approach allows the optimizer to avoid potentially inefficient plans. However, existing pessimistic cardinality estimators are not yet practical: they use very limited statistics on the data, and cannot handle predicates. In this paper, we introduce SafeBound, the first practical system for generating cardinality bounds. SafeBound builds on a recent theoretical work that uses degree sequences on join attributes to compute cardinality bounds, extends this framework with predicates, introduces a practical compression method for the degree sequences, and implements an efficient inference algorithm. Across four workloads, SafeBound achieves up to 80% lower end-to-end runtimes than PostgreSQL, and is on par or better than state of the art ML-based estimators and pessimistic cardinality estimators, by improving the runtime of the expensive queries. It also saves up to 500x in query planning time, and uses up to 6.8x less space compared to state of the art cardinality estimation methods.