Abstract
This paper discusses the problem of efficiently generating a random ordering of the elements of an n-element weighted set, where the elements’ weights are interpreted as relative probabilities. It is assumed that only the following randomness primitives are available: flipping a biased coin, and selecting a random non-negative integer smaller than n. We review the existing literature on this problem, focusing on several simple algorithms whose running times are O(n 2) and O(nlogn). Asymptotically faster algorithms do exist, but they are mostly either very complicated, or require a stronger computational model. The main contribution of this paper is a self-contained specification of and correctness proof for an O(nlogn) mergesort-like folklore algorithm that is often implemented incorrectly. The key piece of the correct algorithm can be viewed as the weighted generalization of the riffle merge operation which appears in mathematical models of card shuffling. Empirical results are also presented which suggest that this algorithm might be faster on a real computer than other simple algorithms which use the same randomness primitives, because its pattern of memory accesses results in fewer cache misses. Finally, three fancier but still implementable algorithms are described, analyzed, and simulated, which are asymptotically faster than O(nlogn), but require that the input weights to be partially sorted.
Similar content being viewed by others
Notes
Disclaimer: this paper is not implying that any particular real-world system does or does not employ value maximization strategies of this type.
The actual Wong and Easton algorithm assumes the existence of a more powerful source of randomness that can pick out a specific leaf with a single call; it is still necessary to walk down a path to find the leaf.
The algorithm cuts the deck into subdecks of size t and n−t with probability \(\binom{n}{t} \cdot 2^{-n}\).
This claim does not apply to all possible algorithms that can be built on those primitives. For example, the algorithm q-alias in Sect. 6 does O(n 2) work but only uses a randomness primitive O(n) times.
Because there is no need to instantiate more than n buckets even when U>2n, O(nlogmin(logU,n)) is a more accurate bound on the cost of this algorithm.
Our implementation dispenses with this complication and simply performs a single 2m-way random riffle merge of all the meta-buckets.
We remark that certain real-world quantities, such as personal income, or city size, might result in skewed bucket counts. Certainly, worldwide, there would be many fewer people falling into a bucket covering incomes between 1 and 2 billion dollars than falling into a bucket covering incomes between 1 and 2 thousand dollars.
For n=219=524288, \(U = 2^{\sqrt{2^{19}}} \approx 9.3 \times 10^{217}\); we didn’t run the experiment for n=220 because \(U = 2^{\sqrt{2^{20}}}\) is bigger than the largest double precision IEEE float.
References
Bayer, D., Diaconis, P.: Trailing the dovetail shuffle to its lair. Ann. Appl. Probab. 2(2), 294–313 (1992)
Bela, B.: Contemporary Combinatorics p. 36. Springer, Budapest (2002)
Bratley, P., Fox, B.L., Schrage, L.: A Guide to Simulation, 2nd edn. Springer, Berlin (1987)
Callahan, P.B.: Output-sensitive generation of random events. In: Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 374–383 (1998)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, New York (2006)
Devroye, L.: Non-Uniform Random Variate Generation. Springer, Berlin (1986)
Flajolet, P., Saheb, N.: The complexity of generating an exponentially distributed variate. J. Algorithms 7(4), 463–488 (1986)
Fox, B.L., Young, A.R.: Generating Markov–Chain transitions quickly: II. INFORMS J. Comput. 3(1), 3–11 (1991)
Hagerup, T., Mehlhorn, K., Munro, J.I.: Maintaining discrete probability distributions optimally. In: International Colloquium on Automata, Languages and Programming (ICALP), pp. 253–264 (1993)
Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. II, 2nd edn. Addison-Wesley, Reading (1981)
Knuth, D.E.: The art of computer programming. In: Sorting and Searching, vol. 3, 2nd edn. Addison-Wesley, Reading (1998)
Lang, K.J.: Practical algorithms for generating a random ordering of the elements of a weighted set. In: Fun with Algorithms—6th International Conference, pp. 270–281 (2012)
van Leeuwen, J.: On the construction of Huffman trees. In: International Colloquium on Automata, Languages and Programming (ICALP), pp. 382–410 (1976)
Linder, T., Tarokh, V., Zeger, K.: Existence of optimal prefix codes for infinite source alphabets. IEEE Trans. Inf. Theory 43(6), 2026–2028 (1997)
Matias, Y., Vitter, J.S., Ni, W.-C.: Dynamic generation of discrete random variates. Theory Comput. Syst. 36(4), 329–358 (2003)
Rajasekaran, S., Ross, K.W.: Fast algorithms for generating discrete random variates with changing distributions. ACM Trans. Model. Comput. Simul. 3(1), 1–19 (1993)
Sedgewick, R., Schidlowsky, M.: Algorithms in Java, Third Edition, Parts 1–4: Fundamentals, Data Structures, Sorting, Searching. Addison-Wesley, Reading (1998)
Walker, A.J.: An efficient method for generating discrete random variables with general distributions. ACM Trans. Math. Softw. 3(3), 253–256 (1977)
Wong, C.K., Easton, M.C.: An efficient method for weighted sampling without replacement. SIAM J. Comput. 9(1), 111–113 (1980)
Acknowledgements
This problem arose during an engineering project conducted jointly with LLuis Garcia-Pueyo, Sergei Vassilvitskii, Suddha Basu, Dongming Jiang, and Joaquin Delgado. The author thanks Anirban DasGupta, Ravi Kumar, Lihong Li, and Martin Zinkevich for helpful discussions. We also thank the anonymous reviewers for their insightful comments, and for bringing the exponential clocks method to our attention.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an extended version of the conference paper [13].
Rights and permissions
About this article
Cite this article
Lang, K.J. Practical Algorithms for Generating a Random Ordering of the Elements of a Weighted Set. Theory Comput Syst 54, 659–688 (2014). https://doi.org/10.1007/s00224-013-9496-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-013-9496-6