Skip to main content
Log in

Practical Algorithms for Generating a Random Ordering of the Elements of a Weighted Set

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

This paper discusses the problem of efficiently generating a random ordering of the elements of an n-element weighted set, where the elements’ weights are interpreted as relative probabilities. It is assumed that only the following randomness primitives are available: flipping a biased coin, and selecting a random non-negative integer smaller than n. We review the existing literature on this problem, focusing on several simple algorithms whose running times are O(n 2) and O(nlogn). Asymptotically faster algorithms do exist, but they are mostly either very complicated, or require a stronger computational model. The main contribution of this paper is a self-contained specification of and correctness proof for an O(nlogn) mergesort-like folklore algorithm that is often implemented incorrectly. The key piece of the correct algorithm can be viewed as the weighted generalization of the riffle merge operation which appears in mathematical models of card shuffling. Empirical results are also presented which suggest that this algorithm might be faster on a real computer than other simple algorithms which use the same randomness primitives, because its pattern of memory accesses results in fewer cache misses. Finally, three fancier but still implementable algorithms are described, analyzed, and simulated, which are asymptotically faster than O(nlogn), but require that the input weights to be partially sorted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Disclaimer: this paper is not implying that any particular real-world system does or does not employ value maximization strategies of this type.

  2. A shorter path to an algorithm with those same properties would be to combine the algorithm in Sect. 4 of this paper with the fact that the expected number of fair coin flips needed to simulate a biased coin is 2, independent of the biased coin’s probability [7, p. 769].

  3. The actual Wong and Easton algorithm assumes the existence of a more powerful source of randomness that can pick out a specific leaf with a single call; it is still necessary to walk down a path to find the leaf.

  4. The algorithm cuts the deck into subdecks of size t and nt with probability \(\binom{n}{t} \cdot 2^{-n}\).

  5. This claim does not apply to all possible algorithms that can be built on those primitives. For example, the algorithm q-alias in Sect. 6 does O(n 2) work but only uses a randomness primitive O(n) times.

  6. Because there is no need to instantiate more than n buckets even when U>2n, O(nlogmin(logU,n)) is a more accurate bound on the cost of this algorithm.

  7. Our implementation dispenses with this complication and simply performs a single 2m-way random riffle merge of all the meta-buckets.

  8. We remark that certain real-world quantities, such as personal income, or city size, might result in skewed bucket counts. Certainly, worldwide, there would be many fewer people falling into a bucket covering incomes between 1 and 2 billion dollars than falling into a bucket covering incomes between 1 and 2 thousand dollars.

  9. For n=219=524288, \(U = 2^{\sqrt{2^{19}}} \approx 9.3 \times 10^{217}\); we didn’t run the experiment for n=220 because \(U = 2^{\sqrt{2^{20}}}\) is bigger than the largest double precision IEEE float.

References

  1. Bayer, D., Diaconis, P.: Trailing the dovetail shuffle to its lair. Ann. Appl. Probab. 2(2), 294–313 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  2. Bela, B.: Contemporary Combinatorics p. 36. Springer, Budapest (2002)

    MATH  Google Scholar 

  3. Bratley, P., Fox, B.L., Schrage, L.: A Guide to Simulation, 2nd edn. Springer, Berlin (1987)

    Book  Google Scholar 

  4. Callahan, P.B.: Output-sensitive generation of random events. In: Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 374–383 (1998)

    Google Scholar 

  5. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  6. Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, New York (2006)

    MATH  Google Scholar 

  7. Devroye, L.: Non-Uniform Random Variate Generation. Springer, Berlin (1986)

    Book  MATH  Google Scholar 

  8. Flajolet, P., Saheb, N.: The complexity of generating an exponentially distributed variate. J. Algorithms 7(4), 463–488 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  9. Fox, B.L., Young, A.R.: Generating Markov–Chain transitions quickly: II. INFORMS J. Comput. 3(1), 3–11 (1991)

    Article  MATH  Google Scholar 

  10. Hagerup, T., Mehlhorn, K., Munro, J.I.: Maintaining discrete probability distributions optimally. In: International Colloquium on Automata, Languages and Programming (ICALP), pp. 253–264 (1993)

    Chapter  Google Scholar 

  11. Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. II, 2nd edn. Addison-Wesley, Reading (1981)

    MATH  Google Scholar 

  12. Knuth, D.E.: The art of computer programming. In: Sorting and Searching, vol. 3, 2nd edn. Addison-Wesley, Reading (1998)

    Google Scholar 

  13. Lang, K.J.: Practical algorithms for generating a random ordering of the elements of a weighted set. In: Fun with Algorithms—6th International Conference, pp. 270–281 (2012)

    Chapter  Google Scholar 

  14. van Leeuwen, J.: On the construction of Huffman trees. In: International Colloquium on Automata, Languages and Programming (ICALP), pp. 382–410 (1976)

    Google Scholar 

  15. Linder, T., Tarokh, V., Zeger, K.: Existence of optimal prefix codes for infinite source alphabets. IEEE Trans. Inf. Theory 43(6), 2026–2028 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  16. Matias, Y., Vitter, J.S., Ni, W.-C.: Dynamic generation of discrete random variates. Theory Comput. Syst. 36(4), 329–358 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  17. Rajasekaran, S., Ross, K.W.: Fast algorithms for generating discrete random variates with changing distributions. ACM Trans. Model. Comput. Simul. 3(1), 1–19 (1993)

    Article  MATH  Google Scholar 

  18. Sedgewick, R., Schidlowsky, M.: Algorithms in Java, Third Edition, Parts 1–4: Fundamentals, Data Structures, Sorting, Searching. Addison-Wesley, Reading (1998)

    Google Scholar 

  19. Walker, A.J.: An efficient method for generating discrete random variables with general distributions. ACM Trans. Math. Softw. 3(3), 253–256 (1977)

    Article  MATH  Google Scholar 

  20. Wong, C.K., Easton, M.C.: An efficient method for weighted sampling without replacement. SIAM J. Comput. 9(1), 111–113 (1980)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

This problem arose during an engineering project conducted jointly with LLuis Garcia-Pueyo, Sergei Vassilvitskii, Suddha Basu, Dongming Jiang, and Joaquin Delgado. The author thanks Anirban DasGupta, Ravi Kumar, Lihong Li, and Martin Zinkevich for helpful discussions. We also thank the anonymous reviewers for their insightful comments, and for bringing the exponential clocks method to our attention.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin J. Lang.

Additional information

This paper is an extended version of the conference paper [13].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lang, K.J. Practical Algorithms for Generating a Random Ordering of the Elements of a Weighted Set. Theory Comput Syst 54, 659–688 (2014). https://doi.org/10.1007/s00224-013-9496-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-013-9496-6

Keywords

Navigation