论文标题

低距离制度中的几乎最佳的司额定时间编辑距离

Almost-Optimal Sublinear-Time Edit Distance in the Low Distance Regime

论文作者

Bringmann, Karl, Cassis, Alejandro, Fischer, Nick, Nakos, Vasileios

论文摘要

我们重新审查以均方根时间计算编辑距离的任务。在$(k,k)$ - 间隙编辑距离问题中,任务是区分两个字符串的编辑距离最多是$ k $还是至少$ k $。它是由Goldenberg,Krauthgamer和Saha(Focs '19)建立的,并进行了Kociumaka和Saha(focs '20)的改进,即可以在时间$ \ \ \ widetilde o(n/k+\ \ \ \ \ \ \ \ permatorname {poly}(k)(k)(k)$。在这一研究中,最自然的问题之一是$(k,k^2)$ - 差距是否最适合运行时间$ \ widetilde o(n/k+\ operatotorname {poly}(k))$。 在这项工作中,我们通过显着改善差距来回答这个问题。具体来说,我们表明,在时间$ o(n/k+\ peripatorName {poly}(k))$我们甚至可以求解$(k,k,k^{1+o(1)})$ - 差距问题。这是第一种在此运行时间中打破$(k,k^2)$ - 差距的算法。从下面的意义上讲,我们的算法几乎是最佳的:在低距离方案($ k \ le n^{0.19} $)中,我们的运行时间变为$ O(n/k)$,它与已知的$ n/k^{1+o(1)} $ bount符合$ n/k^{1+o(1)} $ lownd键$(k,k,k,k^{1+o(1+o(1+o(1+o(1+o(1+o(1+o(1+o(1+o(1+o), 我们的结果还表明,锤距距离和在低距离状态下的编辑距离具有令人惊讶的相似性:对于$(k,k,k^{1+o(1)})$ - 差距问题具有时间复杂性$ n/k^{1 \ pm o(1)} $ for小$ k $。 与以前的工作相反,该工作采用了Landau-Vishkin算法的子采样变体,我们基于Andoni,Krauthgamer和Onak(Focs '10)的算法。我们首先简化了他们的方法,然后展示如何有效修剪其计算树,以便在给定的时间绑定中获得sublrinear-time算法。在此方面,我们对(本地和全局)模式使用各种结构性见解,这些洞察力在此过程中可能会出现并设计适当的财产测试人员以有效检测这些模式。

We revisit the task of computing the edit distance in sublinear time. In the $(k,K)$-gap edit distance problem the task is to distinguish whether the edit distance of two strings is at most $k$ or at least $K$. It has been established by Goldenberg, Krauthgamer and Saha (FOCS '19), with improvements by Kociumaka and Saha (FOCS '20), that the $(k,k^2)$-gap problem can be solved in time $\widetilde O(n/k+\operatorname{poly}(k))$. One of the most natural questions in this line of research is whether the $(k,k^2)$-gap is best-possible for the running time $\widetilde O(n/k+\operatorname{poly}(k))$. In this work we answer this question by significantly improving the gap. Specifically, we show that in time $O(n/k+\operatorname{poly}(k))$ we can even solve the $(k,k^{1+o(1)})$-gap problem. This is the first algorithm that breaks the $(k,k^2)$-gap in this running time. Our algorithm is almost optimal in the following sense: In the low distance regime ($k\le n^{0.19}$) our running time becomes $O(n/k)$, which matches a known $n/k^{1+o(1)}$ lower bound for the $(k,k^{1+o(1)})$-gap problem up to lower order factors. Our result also reveals a surprising similarity of Hamming distance and edit distance in the low distance regime: For both, the $(k,k^{1+o(1)})$-gap problem has time complexity $n/k^{1\pm o(1)}$ for small $k$. In contrast to previous work, which employed a subsampled variant of the Landau-Vishkin algorithm, we instead build upon the algorithm of Andoni, Krauthgamer and Onak (FOCS '10). We first simplify their approach and then show how to to effectively prune their computation tree in order to obtain a sublinear-time algorithm in the given time bound. Towards that, we use a variety of structural insights on the (local and global) patterns that can emerge during this process and design appropriate property testers to effectively detect these patterns.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源