论文标题
与混合尝试的记忆有效的顺序模式挖掘
Memory-Efficient Sequential Pattern Mining with Hybrid Tries
论文作者
论文摘要
本文开发了一种用于顺序模式挖掘(SPM)的记忆效率方法,这是知识发现中的基本话题,它面临着大型数据集的众所周知的记忆瓶颈。我们的方法涉及一种新型的混合Trie数据结构,该结构利用了重复的模式将数据集固定在存储器中。以及相应的采矿算法,旨在从这种紧凑的表示中有效提取模式。中小型现实生活测试实例的数值结果表明,与艺术的状态相比,记忆消耗的平均增长为85%,计算时间的平均提高为49%。对于大型数据集,我们的算法是系统内存256GB中唯一有能力的SPM方法,有可能在内存消耗中节省1.7TB。
This paper develops a memory-efficient approach for Sequential Pattern Mining (SPM), a fundamental topic in knowledge discovery that faces a well-known memory bottleneck for large data sets. Our methodology involves a novel hybrid trie data structure that exploits recurring patterns to compactly store the data set in memory; and a corresponding mining algorithm designed to effectively extract patterns from this compact representation. Numerical results on small to medium-sized real-life test instances show an average improvement of 85% in memory consumption and 49% in computation time compared to the state of the art. For large data sets, our algorithm stands out as the only capable SPM approach within 256GB of system memory, potentially saving 1.7TB in memory consumption.