论文标题
与$ k $不匹配的最长常见子字符串近似:理论与实践
Approximating longest common substring with $k$ mismatches: Theory and practice
论文作者
论文摘要
在$ k $不匹配的最长常见子字符串的问题中,我们得到了两个字符串$ x,y $,并且必须找到最大长度$ \ ell $,以便有一个长度 - $ \ ell $ $ x $的$ x $和一个长度 - $ \ ell $ y $ y $的$ y $,最多在$ k $上有所不同。长度$ \ ell $可以用作$ x,y $之间相似性的强大度量。在这项工作中,我们开发了用于计算$ \ ell $的新近似算法,从理论角度来看,这些$ \ ell $明显比以前已知的解决方案要高得多。我们的方法是简单且实用的,我们通过实验评估确认,并且在我们通过条件下限证明时可能接近最佳。
In the problem of the longest common substring with $k$ mismatches we are given two strings $X, Y$ and must find the maximal length $\ell$ such that there is a length-$\ell$ substring of $X$ and a length-$\ell$ substring of $Y$ that differ in at most $k$ positions. The length $\ell$ can be used as a robust measure of similarity between $X, Y$. In this work, we develop new approximation algorithms for computing $\ell$ that are significantly more efficient that previously known solutions from the theoretical point of view. Our approach is simple and practical, which we confirm via an experimental evaluation, and is probably close to optimal as we demonstrate via a conditional lower bound.