论文标题
完全订购的实用程序最大化的顺序规则
Totally-ordered Sequential Rules for Utility Maximization
论文作者
论文摘要
高效用顺序模式采矿(HUSPM)是具有许多现实世界应用的知识发现和数据分析中的重要活动。在某些情况下,HUSPM无法提供出色的措施来预测会发生什么。高效用顺序规则挖掘(HUSRM)发现了高实用性和高置信顺序规则,从而使其可以解决HUSPM中的问题。所有现有的HUSRM算法旨在找到一部分订购的高量化顺序规则(HUSRS),这些规则与现实不一致,可能会产生假的HUSR。因此,在本文中,我们制定了高公用事业完全有序的顺序规则挖掘的问题,并提出了两种新型算法,称为petalsr和totalsr+,旨在识别所有高效用完全有序的顺序规则(HTSR)。 TotalSR创建了一个实用表,该表可以有效地计算前提支持和一个效用前缀总和列表,该列表可以在序列中计算O(1)时间中的剩余实用程序。我们还引入了一种左派扩展策略,该策略可以利用反单调性属性来使用信心修剪策略。 TotalSr还可以在实用程序上限的修剪策略的帮助下大大减少搜索空间,从而避免更加有意义的计算。此外,TotalSr+使用辅助的先行记录表来更有效地发现HTSR。最后,在真实和合成数据集上都有许多实验结果,表明topalsR比较少的修剪策略的算法要高得多,并且在运行时间和可伸缩性方面,topalsr+效率更高。
High utility sequential pattern mining (HUSPM) is a significant and valuable activity in knowledge discovery and data analytics with many real-world applications. In some cases, HUSPM can not provide an excellent measure to predict what will happen. High utility sequential rule mining (HUSRM) discovers high utility and high confidence sequential rules, allowing it to solve the problem in HUSPM. All existing HUSRM algorithms aim to find high-utility partially-ordered sequential rules (HUSRs), which are not consistent with reality and may generate fake HUSRs. Therefore, in this paper, we formulate the problem of high utility totally-ordered sequential rule mining and propose two novel algorithms, called TotalSR and TotalSR+, which aim to identify all high utility totally-ordered sequential rules (HTSRs). TotalSR creates a utility table that can efficiently calculate antecedent support and a utility prefix sum list that can compute the remaining utility in O(1) time for a sequence. We also introduce a left-first expansion strategy that can utilize the anti-monotonic property to use a confidence pruning strategy. TotalSR can also drastically reduce the search space with the help of utility upper bounds pruning strategies, avoiding much more meaningless computation. In addition, TotalSR+ uses an auxiliary antecedent record table to more efficiently discover HTSRs. Finally, there are numerous experimental results on both real and synthetic datasets demonstrating that TotalSR is significantly more efficient than algorithms with fewer pruning strategies, and TotalSR+ is significantly more efficient than TotalSR in terms of running time and scalability.