论文标题
水桶排序和随机尝试的上尾分析
Upper Tail Analysis of Bucket Sort and Random Tries
论文作者
论文摘要
众所周知,当输入键以间隔$ [0,1)$独立和均匀分配时,将在预期线性时间内运行。即使使用二次时间算法来对每个存储桶中的键进行分类,该分析仍存在。我们展示了如何在具有很高概率的存储桶排序的运行时间上获得线性时间。具体而言,我们研究了指数的渐近行为,在桶排序的运行时间的上尾概率中。我们认为与期望的较大添加剂偏差,对于大于(常数)$ c $的表格$ cn $,其中$ n $是分类的键数。 我们的分析表明,在每个存储桶中使用二次时间算法的桶形排序变体与使用$θ(b \ log b)$ time算法在存储桶中对$ b $ beys排序的变体。当使用二次时间算法将键中的键排序时,桶排序的概率比预期的时间比预期的时间多于$θ(\ sqrt {n} \ log n)$。当使用$θ(b \ log b)$算法将键在存储桶中排序时,指数为$θ(n)$。我们通过在尝试上定义的随机变量的尾部上显示上界来证明后者定理,这是我们认为具有独立关注的结果。该结果还使我们能够分析经过良好研究的Trie参数(外部路径长度)的上尾概率,并表明其偏离其预期值的概率($ CN $)在$θ(n)$中是指数的。
Bucket Sort is known to run in expected linear time when the input keys are distributed independently and uniformly at random in the interval $[0,1)$. The analysis holds even when a quadratic time algorithm is used to sort the keys in each bucket. We show how to obtain linear time guarantees on the running time of Bucket Sort that hold with very high probability. Specifically, we investigate the asymptotic behavior of the exponent in the upper tail probability of the running time of Bucket Sort. We consider large additive deviations from the expectation, of the form $cn$ for large enough (constant) $c$, where $n$ is the number of keys that are sorted. Our analysis shows a profound difference between variants of Bucket Sort that use a quadratic time algorithm within each bucket and variants that use a $Θ(b\log b)$ time algorithm for sorting $b$ keys in a bucket. When a quadratic time algorithm is used to sort the keys in a bucket, the probability that Bucket Sort takes $cn$ more time than expected is exponential in $Θ(\sqrt{n}\log n)$. When a $Θ(b\log b)$ algorithm is used to sort the keys in a bucket, the exponent becomes $Θ(n)$. We prove this latter theorem by showing an upper bound on the tail of a random variable defined on tries, a result which we believe is of independent interest. This result also enables us to analyze the upper tail probability of a well-studied trie parameter, the external path length, and show that the probability that it deviates from its expected value by an additive factor of $cn$ is exponential in $Θ(n)$.