论文标题

glap型图案匹配的快速索引

Fast Indexes for Gapped Pattern Matching

论文作者

Cáceres, Manuel, Puglisi, Simon J., Zhukova, Bella

论文摘要

我们描述了搜索大型数据集的索引,以了解可变长度 - 胶合(VLG)模式。 VLG模式由两个或多个子图案组成,在每个相邻对之间是一个间隙约束,指定了subpaterns之间允许的距离上的上限和下限。 VLG模式在计算生物学(主题搜索),信息检索(例如,对于语言模型,摘要生成,机器翻译)中有许多应用,并捕获了在搜索源代码中通常使用的正则表达式的有用子类。我们的最佳方法提供的搜索速度几倍比以前的艺术在广泛的模式和文本中更快。

We describe indexes for searching large data sets for variable-length-gapped (VLG) patterns. VLG patterns are composed of two or more subpatterns, between each adjacent pair of which is a gap-constraint specifying upper and lower bounds on the distance allowed between subpatterns. VLG patterns have numerous applications in computational biology (motif search), information retrieval (e.g., for language models, snippet generation, machine translation) and capture a useful subclass of the regular expressions commonly used in practice for searching source code. Our best approach provides search speeds several times faster than prior art across a broad range of patterns and texts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源