论文标题
非重叠(delta,伽马) - 适当模式匹配
Nonoverlapping (delta, gamma)-approximate pattern matching
论文作者
论文摘要
模式匹配可用于计算模式的支持,并且是顺序模式挖掘(或序列模式挖掘)中的关键问题。非重叠的模式匹配意味着两个出现在同一位置处的序列中不能使用相同的字符。近似模式匹配可以产生一些数据噪声,并且比精确的模式匹配更通用。目前,非重叠的近似模式匹配是基于锤距离的,该距离距离不能用来测量子序列和模式之间的局部近似,从而导致匹配结果的偏差很大。为了解决这个问题,我们提出了一种非重叠的三角洲和伽玛近似模式匹配(NDP)方案,该方案采用了(Delta,Gamma) - 距离进行近似模式匹配,而本地和全球距离分别不超过Delta和Gamma。我们首先将NDP问题转换为局部近似NetTree,然后构建一种有效的算法,称为NDP(NETNDP)的局部近似NetTree。我们提出了一种称为最小根距离的新方法,该方法使我们能够确定一个节点是否具有满足全局约束并修剪无效节点和亲子关系的根路径。 NETNDP找到了最大根的最右侧绝对叶,从最右侧的绝对叶子中搜索最右边的出现,并删除了这种情况。我们迭代上述步骤,直到没有新事件为止。许多实验用于验证所提出的算法的性能。
Pattern matching can be used to calculate the support of patterns, and is a key issue in sequential pattern mining (or sequence pattern mining). Nonoverlapping pattern matching means that two occurrences cannot use the same character in the sequence at the same position. Approximate pattern matching allows for some data noise, and is more general than exact pattern matching. At present, nonoverlapping approximate pattern matching is based on Hamming distance, which cannot be used to measure the local approximation between the subsequence and pattern, resulting in large deviations in matching results. To tackle this issue, we present a Nonoverlapping Delta and gamma approximate Pattern matching (NDP) scheme that employs the (delta, gamma)-distance to give an approximate pattern matching, where the local and the global distances do not exceed delta and gamma, respectively. We first transform the NDP problem into a local approximate Nettree and then construct an efficient algorithm, called the local approximate Nettree for NDP (NetNDP). We propose a new approach called the Minimal Root Distance which allows us to determine whether or not a node has root paths that satisfy the global constraint and to prune invalid nodes and parent-child relationships. NetNDP finds the rightmost absolute leaf of the max root, searches for the rightmost occurrence from the rightmost absolute leaf, and deletes this occurrence. We iterate the above steps until there are no new occurrences. Numerous experiments are used to verify the performance of the proposed algorithm.