论文标题
通过星形PFA对Trie高度的平滑分析
Smoothed Analysis of Trie Height by Star-like PFAs
论文作者
论文摘要
尝试是信息检索的通用数据结构。 Trie的最重要参数是其高度$ h $,其等于集合$ a $ a $的最长公共前缀的长度。对随机尝试的分析调查表明,$ {\ bf e}(h)\在o(\ log(\ | a \ |))$中,尽管在最坏情况下$ h $是无限的。此外,$ H $的分销功能的敏锐结果是许多随机弦源而闻名的。但是,由于平均案例分析背后建模的固有弱点 - 分析以随机数据为主 - 这些结果可以完全解释以下事实:在许多实际情况下,Trie高度是对数。我们提出了一个新的半随机字符串模型,并执行平滑的分析,以便为实际发现提供更严格的解释。我们认为的扰动函数是基于概率有限自动机(PFA)的,我们表明代表PFA的过渡概率完全表征了平滑Trie高度的渐近生长。我们的主要结果是二分法性质---对数或无界的 - 乍看之下肯定不足为奇,但是我们还提供了定量的上限和下限,这些上限是使用多变量生成函数得出的,以表达扰动PFA的计算。直接结果是用于编辑扰动的对数Trie高度(即随机插入,删除和替换)。
Tries are general purpose data structures for information retrieval. The most significant parameter of a trie is its height $H$ which equals the length of the longest common prefix of any two string in the set $A$ over which the trie is built. Analytical investigations of random tries suggest that ${\bf E}(H)\in O(\log(\|A\|))$, although $H$ is unbounded in the worst case. Moreover, sharp results on the distribution function of $H$ are known for many different random string sources. But because of the inherent weakness of the modeling behind average-case analysis---analyses being dominated by random data---these results can utterly explain the fact that in many practical situations the trie height is logarithmic. We propose a new semi-random string model and perform a smoothed analysis in order to give a mathematically more rigorous explanation for the practical findings. The perturbation functions which we consider are based on probabilistic finite automata (PFA) and we show that the transition probabilities of the representing PFA completely characterize the asymptotic growth of the smoothed trie height. Our main result is of dichotomous nature---logarithmic or unbounded---and is certainly not surprising at first glance, but we also give quantitative upper and lower bounds, which are derived using multivariate generating function in order to express the computations of the perturbing PFA. A direct consequence is the logarithmic trie height for edit perturbations(i.e., random insertions, deletions and substitutions).