用于建模大型数据集用户探索的自适应基准

论文标题

用于建模大型数据集用户探索的自适应基准

An Adaptive Benchmark for Modeling User Exploration of Large Datasets

论文作者

Purich, Joanna, Wise, Anthony, Battle, Leilani

论文摘要

在本文中，我们提出了一个新的DBMS性能基准测试，该基准可以使用标准可视化和交互组件制成的任何指定的仪表板设计模拟用户探索。我们的基于仿真的（或SIMBA）基准的区别特征是它可以通过有效的用户交互序列生成的用户分析目标作为一组SQL查询的能力，并通过测试用户以前的查询与目标查询之间的等价性来测量分析目标的完成。通过这种方式，Simba基准可以模拟分析师在探索会议开始时如何在机会主义地寻找有趣的见解，并最终在最后的特定目标上磨练。为了证明SIMBA基准测试的多功能性，我们使用它来测试具有六个不同仪表板规范的四个DBMS的性能，并将我们的结果与IDEBENCH进行比较。我们的结果表明，目标驱动的仿真如何揭示现有基准测试方法以及各种数据探索场景中遗漏的DBMS性能的空白。

In this paper, we present a new DBMS performance benchmark that can simulate user exploration with any specified dashboard design made of standard visualization and interaction components. The distinguishing feature of our SImulation-BAsed (or SIMBA) benchmark is its ability to model user analysis goals as a set of SQL queries to be generated through a valid sequence of user interactions, as well as measure the completion of analysis goals by testing for equivalence between the user's previous queries and their goal queries. In this way, the SIMBA benchmark can simulate how an analyst opportunistically searches for interesting insights at the beginning of an exploration session and eventually hones in on specific goals towards the end. To demonstrate the versatility of the SIMBA benchmark, we use it to test the performance of four DBMSs with six different dashboard specifications and compare our results with IDEBench. Our results show how goal-driven simulation can reveal gaps in DBMS performance missed by existing benchmarking methods and across a range of data exploration scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题