Paper Supplement Video Code Experiments Link abstract bibtex 60 downloads

This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, ṡ, p_n)$, where the probabilities of the output distribution $(p̂_1, ṡ, p̂_n)$ of the sampling algorithm must be specified using at most $k$ bits of precision? We present a theoretical framework for formulating this problem and provide new techniques for finding sampling algorithms that are optimal both statistically (in the sense of sampling accuracy) and information-theoretically (in the sense of entropy consumption). We leverage these results to build a system that, for a broad family of measures of statistical accuracy, delivers a sampling algorithm whose expected entropy usage is minimal among those that induce the same distribution (i.e., is ``entropy-optimal'') and whose output distribution $(p̂_1, ṡ, p̂_n)$ is a closest approximation to the target distribution $(p_1, ṡ, p_n)$ among all entropy-optimal sampling algorithms that operate within the specified $k$-bit precision. This optimal approximate sampler is also a closer approximation than any (possibly entropy-suboptimal) sampler that consumes a bounded amount of entropy with the specified precision, a class which includes floating-point implementations of inversion sampling and related methods found in many software libraries. We evaluate the accuracy, entropy consumption, precision requirements, and wall-clock runtime of our optimal approximate sampling algorithms on a broad set of distributions, demonstrating the ways that they are superior to existing approximate samplers and establishing that they often consume significantly fewer resources than are needed by exact samplers.

@article{saad2020sampling, title = {Optimal approximate sampling from discrete probability distributions}, author = {Saad, Feras A. and Freer, Cameron E. and Rinard, Martin C. and Mansinghka, Vikash K.}, journal = {Proceedings of the ACM on Programming Languages}, volume = 4, number = {POPL}, month = jan, year = 2020, pages = {36:1--36:31}, articleno = 36, numpages = 31, publisher = {ACM}, keywords = {discrete random variables, random variate generation}, url_paper = {https://dl.acm.org/ft_gateway.cfm?id=3371104}, url_supplement = {https://dl.acm.org/action/downloadSupplement?doi=10.1145%2F3371104&file=popl20main-p126-p-aux.zip&download=true}, url_video = {https://www.youtube.com/watch?v=Eb0CtuK7K3g}, url_code = {https://github.com/probcomp/optimal-approximate-sampling}, url_experiments = {https://doi.org/10.1145/3373106}, url_link = {https://doi.org/10.1145/3371104}, abstract = {This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the probabilities of the output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ of the sampling algorithm must be specified using at most $k$ bits of precision? We present a theoretical framework for formulating this problem and provide new techniques for finding sampling algorithms that are optimal both statistically (in the sense of sampling accuracy) and information-theoretically (in the sense of entropy consumption). We leverage these results to build a system that, for a broad family of measures of statistical accuracy, delivers a sampling algorithm whose expected entropy usage is minimal among those that induce the same distribution (i.e., is ``entropy-optimal'') and whose output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ is a closest approximation to the target distribution $(p_1, \dots, p_n)$ among all entropy-optimal sampling algorithms that operate within the specified $k$-bit precision. This optimal approximate sampler is also a closer approximation than any (possibly entropy-suboptimal) sampler that consumes a bounded amount of entropy with the specified precision, a class which includes floating-point implementations of inversion sampling and related methods found in many software libraries. We evaluate the accuracy, entropy consumption, precision requirements, and wall-clock runtime of our optimal approximate sampling algorithms on a broad set of distributions, demonstrating the ways that they are superior to existing approximate samplers and establishing that they often consume significantly fewer resources than are needed by exact samplers.}, }

Downloads: 60