论文标题

诱导后缀排序的语法压缩

Grammar Compression By Induced Suffix Sorting

论文作者

Nunes, Daniel S. N., Louza, Felipe A., Gog, Simon, Ayala-Rincón, Mauricio, Navarro, Gonzalo

论文摘要

这项工作引入了一种称为GCIS的语法压缩算法。 GCIS基于Nong等人提出的诱导后缀排序算法SAI。在2009年。提出的解决方案建立在后缀排序过程中SAIS执行的分解基础上。无上下文的语法用于用非末端代替因素。然后将算法递归应用于较短的非末端序列。通过利用一些冗余(例如,根据SAIS对右手的规则之间的共同前缀)来编码产生的语法。 GCIS符合其压缩所需的低空间和时间,同时获得竞争性压缩比。我们对常规和重复,中等和非常大的文本进行了实验,表明GCIS是非常方便的选择,与众所周知的压缩机(例如GZIP,7-ZIP和Repair)相比,语法压缩中的金标准标准。作为交换,GCIS解压缩时很慢。但是,语法压缩机比Lempel-Ziv压缩机更方便,因为人们可以直接以压缩形式访问文本子字符串,而无需解压缩文本。我们证明GCIS是这种情况的绝佳候选者,因为它表明在其基于维修的替代方案中具有竞争力。我们还展示了GCIS与SAI的关系如何使其成为构建后缀阵列和LCP阵列的良好中间结构。

A grammar compression algorithm, called GCIS, is introduced in this work. GCIS is based on the induced suffix sorting algorithm SAIS, presented by Nong et al. in 2009. The proposed solution builds on the factorization performed by SAIS during suffix sorting. A context-free grammar is used to replace factors by non-terminals. The algorithm is then recursively applied on the shorter sequence of non-terminals. The resulting grammar is encoded by exploiting some redundancies, such as common prefixes between right-hands of rules, sorted according to SAIS. GCIS excels for its low space and time required for compression while obtaining competitive compression ratios. Our experiments on regular and repetitive, moderate and very large texts, show that GCIS stands as a very convenient choice compared to well-known compressors such as Gzip, 7-Zip, and RePair, the gold standard in grammar compression. In exchange, GCIS is slow at decompressing. Yet, grammar compressors are more convenient than Lempel-Ziv compressors in that one can access text substrings directly in compressed form, without ever decompressing the text. We demonstrate that GCIS is an excellent candidate for this scenario because it shows to be competitive among its RePair based alternatives. We also show, how GCIS relation with SAIS makes it a good intermediate structure to build the suffix array and the LCP array during decompression of the text.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源