论文标题

ANS熵编码的效率

Efficiency of ANS Entropy Encoders

论文作者

Kosolobov, Dmitry

论文摘要

非对称数字系统(ANS)是一类熵编码器,对数据压缩产生巨大影响,替代算术和Huffman编码。它是由不同的作者研究的,但是尚未完全理解其冗余的确切渐近造型(相对于熵)。我们获得了最受欢迎的ANS变体的表ANS(TAN)的冗余的最佳界限。 Given a sequence $a_1,a_2,\ldots,a_n$ of symbols from an alphabet $\{0,1,\ldots,σ-1\}$ such that each symbol $a$ occurs in it $f_a$ times and $n=2^r$, the tANS encoder using Duda's ``precise initialization'' to fill tANS tables transforms this sequence into a bit string of以下长度(编码中未包含频率):$ \ sum \ limits_ {a \ in [0..σ)} f_a \ cdot \ cdot \ cdot \ log \ frac {n} {n} {f_a}+o(σ+r)$,其中$ o(σ+r)$ boldy $ o(σ+r)可以按$ n log nog form bold。 $ r $ bit的术语是ANS必不可少的工件;其余的会导致每个符号的$ O(\fracσ{n})$位的冗余。我们通过示例表明$ω(σ+r)$冗余是必要的。我们认为,对于TAN的大多数适当初始化方法也存在类似的示例。因此,我们驳斥了杜达(Duda)的猜想,即每个符号的冗余为$ o(\fracσ{n^2})$位。我们还提出了一个范围ANS(RANS)的变体,称为固定精度,由$ k \ ge 1 $进行参数化。在此变体中,只有在其结果属于$ [2^k..2^{k+1})$的情况下才能执行整数部门。因此,可以通过更快的方法来计算该部门,提供$ K $很小。我们通过$ \ frac {n} {2^k-1} \ log log e+r $绑定了RANS变体的冗余。

Asymmetric Numeral Systems (ANS) is a class of entropy encoders that had an immense impact on the data compression, substituting arithmetic and Huffman coding. It was studied by different authors but the precise asymptotics of its redundancy (in relation to the entropy) was not completely understood. We obtain optimal bounds for the redundancy of the tabled ANS (tANS), the most popular ANS variant. Given a sequence $a_1,a_2,\ldots,a_n$ of symbols from an alphabet $\{0,1,\ldots,σ-1\}$ such that each symbol $a$ occurs in it $f_a$ times and $n=2^r$, the tANS encoder using Duda's ``precise initialization'' to fill tANS tables transforms this sequence into a bit string of the following length (the frequencies are not included in the encoding): $\sum\limits_{a\in[0..σ)}f_a\cdot\log\frac{n}{f_a}+O(σ+r)$, where $O(σ+r)$ can be bounded by $σ\log e+r$. The $r$-bit term is an artifact indispensable to ANS; the rest incurs a redundancy of $O(\fracσ{n})$ bits per symbol. We complement this by examples showing that an $Ω(σ+r)$ redundancy is necessary. We argue that similar examples exist for most adequate initialization methods for tANS. Thus, we refute Duda's conjecture that the redundancy is $O(\fracσ{n^2})$ bits per symbol. We also propose a variant of the range ANS (rANS), called rANS with fixed accuracy, parameterized by $k\ge 1$. In this variant the integer division, which is unavoidable in rANS, is performed only when its result belongs to $[2^k..2^{k+1})$. Therefore, the division can be computed by faster methods provided $k$ is small. We bound the redundancy for our rANS variant by $\frac{n}{2^k-1}\log e+r$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源