论文标题
优点:数据系列进行性K-NN相似性搜索和概率质量保证
ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees
论文作者
论文摘要
现有的系统处理数据系列量不断增加的系统也无法保证交互式响应时间,即使对于基本任务(例如相似性搜索)也是如此。因此,在计算最终和确切的结果之前,必须开发通过提供渐进效果来支持探索和决策的分析方法。当应用于大规模数据系列集合时,先前的作品缺乏效率和准确性。我们介绍并实验评估PROS,这是一种基于概率学习的新方法,可为渐进式最近的邻居(NN)查询回答提供质量保证。我们开发了K-NN查询的方法,并演示了如何使用两种最流行的距离测量方法(即欧几里得和动态时间扭曲(DTW))应用。我们提供对最终答案的初始和渐进估计,这些估计在相似性搜索过程中变得越来越好,也可以为进行性查询提供合适的停止标准。此外,我们描述了如何使用此方法来开发用于数据系列分类的渐进算法(基于K-NN分类器),并且我们还提出了一种专门为分类任务设计的方法。使用多种合成和真实数据集进行的实验表明,我们的预测方法构成了问题的第一个实用解决方案,大大优于竞争方法。本文发表在《 VLDB杂志》(2022)中。
Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analytic approaches that support exploration and decision making by providing progressive results, before the final and exact ones have been computed. Prior works lack both efficiency and accuracy when applied to large-scale data series collections. We present and experimentally evaluate ProS, a new probabilistic learning-based method that provides quality guarantees for progressive Nearest Neighbor (NN) query answering. We develop our method for k-NN queries and demonstrate how it can be applied with the two most popular distance measures, namely, Euclidean and Dynamic Time Warping (DTW). We provide both initial and progressive estimates of the final answer that are getting better during the similarity search, as well suitable stopping criteria for the progressive queries. Moreover, we describe how this method can be used in order to develop a progressive algorithm for data series classification (based on a k-NN classifier), and we additionally propose a method designed specifically for the classification task. Experiments with several and diverse synthetic and real datasets demonstrate that our prediction methods constitute the first practical solutions to the problem, significantly outperforming competing approaches. This paper was published in the VLDB Journal (2022).