论文标题
TASPM:有针对性的顺序模式挖掘
TaSPM: Targeted Sequential Pattern Mining
论文作者
论文摘要
顺序模式挖掘(SPM)是模式挖掘的重要技术,在现实中具有许多应用。尽管已经提出了许多有效的顺序模式挖掘算法,但很少有研究可以集中于目标序列。有针对性的查询顺序模式不仅可以减少SPM生成的序列数量,还可以提高用户在执行模式分析中的效率。目标序列查询上可用的当前算法基于特定方案,不能推广到其他应用程序。在本文中,我们制定了靶向顺序模式挖掘的问题,并根据快速CM-SPAM算法提出了一个通用框架TASPM。更重要的是,为了提高TASPM在大规模数据集和基于多项目的序列数据集上的效率,我们提出了几种修剪策略,以减少采矿过程中毫无意义的操作。 TASPM完全设计了四种修剪策略,因此可以快速终止不必要的模式扩展并实现更好的性能。最后,我们在不同数据集上进行了广泛的实验,以将现有的SPM算法与TASPM进行比较。实验表明,新颖的针对采矿算法TASPM可以实现更快的运行时间和更少的记忆消耗。
Sequential pattern mining (SPM) is an important technique of pattern mining, which has many applications in reality. Although many efficient sequential pattern mining algorithms have been proposed, there are few studies can focus on target sequences. Targeted querying sequential patterns can not only reduce the number of sequences generated by SPM, but also improve the efficiency of users in performing pattern analysis. The current algorithms available on targeted sequence querying are based on specific scenarios and cannot be generalized to other applications. In this paper, we formulate the problem of targeted sequential pattern mining and propose a generic framework namely TaSPM, based on the fast CM-SPAM algorithm. What's more, to improve the efficiency of TaSPM on large-scale datasets and multiple-items-based sequence datasets, we propose several pruning strategies to reduce meaningless operations in mining processes. Totally four pruning strategies are designed in TaSPM, and hence it can terminate unnecessary pattern extensions quickly and achieve better performance. Finally, we conduct extensive experiments on different datasets to compare the existing SPM algorithms with TaSPM. Experiments show that the novel targeted mining algorithm TaSPM can achieve faster running time and less memory consumption.