论文标题

语法压缩和概率无上下文语法

Grammar compression with probabilistic context-free grammar

论文作者

Naganuma, Hiroaki, Hendrian, Diptarama, Yoshinaka, Ryo, Shinohara, Ayumi, Kobayashi, Naoki

论文摘要

我们提出了一种基于语法压缩的通用无损文本压缩的新方法。在文献中,目标字符串$ t $已被压缩为无上下文的chomsky普通形式,满足$ l(g)= \ {t \} $。这样的语法通常称为\ emph {直线程序}(SLP)。在本文中,我们考虑了产生$ t $的概率语法$ g $,但不一定是$ l(g)$的独特元素。为了明确地恢复原始文本$ t $,我们同时将$ t $的语法$ g $和$ t $的派生树保留在$ g $中的$ g $,以压缩形式。我们表明了一些简单的证据表明,从理论和实际观点来看,对于某些文本,我们的建议确实比SLP更有效。

We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = \{T\}$. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily as a unique element of $L(G)$. In order to recover the original text $T$ unambiguously, we keep both the grammar $G$ and the derivation tree of $T$ from the start symbol in $G$, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源