论文标题

从离散概率分布中进行的最佳近似采样

Optimal Approximate Sampling from Discrete Probability Distributions

论文作者

Saad, Feras A., Freer, Cameron E., Rinard, Martin C., Mansinghka, Vikash K.

论文摘要

本文解决了随机变化生成中的一个基本问题:给定访问一个随机来源,该源排出了独立的公平位,什么是从离散概率分布$(p_1,\ dots,p_n)$进行采样的最准确和熵效率的算法,如果输出分布$(p_1,\ dots,p_n)$(必须使用最多的$ k $精确度来指定采样算法的?我们提出了一个理论框架,用于制定此问题,并提供新的技术,用于查找统计学上最佳的采样算法(就采样精度而言)和信息(从熵消耗的意义上)。我们利用这些结果来构建一个系统,该系统对于统计准确性的广泛度量,提供了一种采样算法,其预期的熵用法在诱导相同分布的人中的预期用法最小(即“熵 - 偏好”)和其输出$(\ hat {p} _1 _1,\ dots $ a_ lats $ a__________________ $(p_1,\ dots,p_n)$中的所有熵 - 最佳采样算法都在指定的$ k $ bit Precision中运行。该最佳近似采样器也比任何(可能是熵 - 次点的)采样器更接近近似,该采样器使用指定的精度消耗了有限量的熵,其中包括反转抽样的浮点数和相关方法的浮点数。我们在广泛的分布中评估了我们最佳近似采样算法的准确性,熵消耗,精度要求和壁式运行时,这证明了它们优于现有近似采样器的方式,并确定它们通常会比精确采样器所需的资源要少得多。

This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the probabilities of the output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ of the sampling algorithm must be specified using at most $k$ bits of precision? We present a theoretical framework for formulating this problem and provide new techniques for finding sampling algorithms that are optimal both statistically (in the sense of sampling accuracy) and information-theoretically (in the sense of entropy consumption). We leverage these results to build a system that, for a broad family of measures of statistical accuracy, delivers a sampling algorithm whose expected entropy usage is minimal among those that induce the same distribution (i.e., is "entropy-optimal") and whose output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ is a closest approximation to the target distribution $(p_1, \dots, p_n)$ among all entropy-optimal sampling algorithms that operate within the specified $k$-bit precision. This optimal approximate sampler is also a closer approximation than any (possibly entropy-suboptimal) sampler that consumes a bounded amount of entropy with the specified precision, a class which includes floating-point implementations of inversion sampling and related methods found in many software libraries. We evaluate the accuracy, entropy consumption, precision requirements, and wall-clock runtime of our optimal approximate sampling algorithms on a broad set of distributions, demonstrating the ways that they are superior to existing approximate samplers and establishing that they often consume significantly fewer resources than are needed by exact samplers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源