论文标题
从离散概率分布中进行的最佳近似采样
Optimal Approximate Sampling from Discrete Probability Distributions
论文作者
论文摘要
本文解决了随机变化生成中的一个基本问题:给定访问一个随机来源,该源排出了独立的公平位,什么是从离散概率分布$(p_1,\ dots,p_n)$进行采样的最准确和熵效率的算法,如果输出分布$(p_1,\ dots,p_n)$(必须使用最多的$ k $精确度来指定采样算法的?我们提出了一个理论框架,用于制定此问题,并提供新的技术,用于查找统计学上最佳的采样算法(就采样精度而言)和信息(从熵消耗的意义上)。我们利用这些结果来构建一个系统,该系统对于统计准确性的广泛度量,提供了一种采样算法,其预期的熵用法在诱导相同分布的人中的预期用法最小(即“熵 - 偏好”)和其输出$(\ hat {p} _1 _1,\ dots $ a_ lats $ a__________________ $(p_1,\ dots,p_n)$中的所有熵 - 最佳采样算法都在指定的$ k $ bit Precision中运行。该最佳近似采样器也比任何(可能是熵 - 次点的)采样器更接近近似,该采样器使用指定的精度消耗了有限量的熵,其中包括反转抽样的浮点数和相关方法的浮点数。我们在广泛的分布中评估了我们最佳近似采样算法的准确性,熵消耗,精度要求和壁式运行时,这证明了它们优于现有近似采样器的方式,并确定它们通常会比精确采样器所需的资源要少得多。
This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the probabilities of the output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ of the sampling algorithm must be specified using at most $k$ bits of precision? We present a theoretical framework for formulating this problem and provide new techniques for finding sampling algorithms that are optimal both statistically (in the sense of sampling accuracy) and information-theoretically (in the sense of entropy consumption). We leverage these results to build a system that, for a broad family of measures of statistical accuracy, delivers a sampling algorithm whose expected entropy usage is minimal among those that induce the same distribution (i.e., is "entropy-optimal") and whose output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ is a closest approximation to the target distribution $(p_1, \dots, p_n)$ among all entropy-optimal sampling algorithms that operate within the specified $k$-bit precision. This optimal approximate sampler is also a closer approximation than any (possibly entropy-suboptimal) sampler that consumes a bounded amount of entropy with the specified precision, a class which includes floating-point implementations of inversion sampling and related methods found in many software libraries. We evaluate the accuracy, entropy consumption, precision requirements, and wall-clock runtime of our optimal approximate sampling algorithms on a broad set of distributions, demonstrating the ways that they are superior to existing approximate samplers and establishing that they often consume significantly fewer resources than are needed by exact samplers.