Practical Algorithms for Generating a Random Ordering of the Elements of a Weighted Set

Lang, Kevin J.

doi:10.1007/s00224-013-9496-6

Practical Algorithms for Generating a Random Ordering of the Elements of a Weighted Set

Published: 31 August 2013

Volume 54, pages 659–688, (2014)
Cite this article

Theory of Computing Systems Aims and scope Submit manuscript

Kevin J. Lang¹

189 Accesses
2 Citations
Explore all metrics

Abstract

This paper discusses the problem of efficiently generating a random ordering of the elements of an n-element weighted set, where the elements’ weights are interpreted as relative probabilities. It is assumed that only the following randomness primitives are available: flipping a biased coin, and selecting a random non-negative integer smaller than n. We review the existing literature on this problem, focusing on several simple algorithms whose running times are O(n ²) and O(nlogn). Asymptotically faster algorithms do exist, but they are mostly either very complicated, or require a stronger computational model. The main contribution of this paper is a self-contained specification of and correctness proof for an O(nlogn) mergesort-like folklore algorithm that is often implemented incorrectly. The key piece of the correct algorithm can be viewed as the weighted generalization of the riffle merge operation which appears in mathematical models of card shuffling. Empirical results are also presented which suggest that this algorithm might be faster on a real computer than other simple algorithms which use the same randomness primitives, because its pattern of memory accesses results in fewer cache misses. Finally, three fancier but still implementable algorithms are described, analyzed, and simulated, which are asymptotically faster than O(nlogn), but require that the input weights to be partially sorted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of consensus sorting via the cycle metric

Article 02 March 2020

Ivan Avramovic & Dana S. Richards

Sorting by Swaps with Noisy Comparisons

Article 19 March 2018

Tomáš Gavenčiak, Barbara Geissmann & Johannes Lengler

Analysis of Consensus Sorting via the Cycle Metric

Notes

Disclaimer: this paper is not implying that any particular real-world system does or does not employ value maximization strategies of this type.
A shorter path to an algorithm with those same properties would be to combine the algorithm in Sect. 4 of this paper with the fact that the expected number of fair coin flips needed to simulate a biased coin is 2, independent of the biased coin’s probability [7, p. 769].
The actual Wong and Easton algorithm assumes the existence of a more powerful source of randomness that can pick out a specific leaf with a single call; it is still necessary to walk down a path to find the leaf.
The algorithm cuts the deck into subdecks of size t and n−t with probability \(\binom{n}{t} \cdot 2^{-n}\).
This claim does not apply to all possible algorithms that can be built on those primitives. For example, the algorithm q-alias in Sect. 6 does O(n ²) work but only uses a randomness primitive O(n) times.
Because there is no need to instantiate more than n buckets even when U>2ⁿ, O(nlogmin(logU,n)) is a more accurate bound on the cost of this algorithm.
Our implementation dispenses with this complication and simply performs a single 2m-way random riffle merge of all the meta-buckets.
We remark that certain real-world quantities, such as personal income, or city size, might result in skewed bucket counts. Certainly, worldwide, there would be many fewer people falling into a bucket covering incomes between 1 and 2 billion dollars than falling into a bucket covering incomes between 1 and 2 thousand dollars.
For n=2¹⁹=524288, \(U = 2^{\sqrt{2^{19}}} \approx 9.3 \times 10^{217}\); we didn’t run the experiment for n=2²⁰ because \(U = 2^{\sqrt{2^{20}}}\) is bigger than the largest double precision IEEE float.

References

Bayer, D., Diaconis, P.: Trailing the dovetail shuffle to its lair. Ann. Appl. Probab. 2(2), 294–313 (1992)
Article MATH MathSciNet Google Scholar
Bela, B.: Contemporary Combinatorics p. 36. Springer, Budapest (2002)
MATH Google Scholar
Bratley, P., Fox, B.L., Schrage, L.: A Guide to Simulation, 2nd edn. Springer, Berlin (1987)
Book Google Scholar
Callahan, P.B.: Output-sensitive generation of random events. In: Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 374–383 (1998)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
MATH Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, New York (2006)
MATH Google Scholar
Devroye, L.: Non-Uniform Random Variate Generation. Springer, Berlin (1986)
Book MATH Google Scholar
Flajolet, P., Saheb, N.: The complexity of generating an exponentially distributed variate. J. Algorithms 7(4), 463–488 (1986)
Article MATH MathSciNet Google Scholar
Fox, B.L., Young, A.R.: Generating Markov–Chain transitions quickly: II. INFORMS J. Comput. 3(1), 3–11 (1991)
Article MATH Google Scholar
Hagerup, T., Mehlhorn, K., Munro, J.I.: Maintaining discrete probability distributions optimally. In: International Colloquium on Automata, Languages and Programming (ICALP), pp. 253–264 (1993)
Chapter Google Scholar
Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. II, 2nd edn. Addison-Wesley, Reading (1981)
MATH Google Scholar
Knuth, D.E.: The art of computer programming. In: Sorting and Searching, vol. 3, 2nd edn. Addison-Wesley, Reading (1998)
Google Scholar
Lang, K.J.: Practical algorithms for generating a random ordering of the elements of a weighted set. In: Fun with Algorithms—6th International Conference, pp. 270–281 (2012)
Chapter Google Scholar
van Leeuwen, J.: On the construction of Huffman trees. In: International Colloquium on Automata, Languages and Programming (ICALP), pp. 382–410 (1976)
Google Scholar
Linder, T., Tarokh, V., Zeger, K.: Existence of optimal prefix codes for infinite source alphabets. IEEE Trans. Inf. Theory 43(6), 2026–2028 (1997)
Article MATH MathSciNet Google Scholar
Matias, Y., Vitter, J.S., Ni, W.-C.: Dynamic generation of discrete random variates. Theory Comput. Syst. 36(4), 329–358 (2003)
Article MATH MathSciNet Google Scholar
Rajasekaran, S., Ross, K.W.: Fast algorithms for generating discrete random variates with changing distributions. ACM Trans. Model. Comput. Simul. 3(1), 1–19 (1993)
Article MATH Google Scholar
Sedgewick, R., Schidlowsky, M.: Algorithms in Java, Third Edition, Parts 1–4: Fundamentals, Data Structures, Sorting, Searching. Addison-Wesley, Reading (1998)
Google Scholar
Walker, A.J.: An efficient method for generating discrete random variables with general distributions. ACM Trans. Math. Softw. 3(3), 253–256 (1977)
Article MATH Google Scholar
Wong, C.K., Easton, M.C.: An efficient method for weighted sampling without replacement. SIAM J. Comput. 9(1), 111–113 (1980)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

This problem arose during an engineering project conducted jointly with LLuis Garcia-Pueyo, Sergei Vassilvitskii, Suddha Basu, Dongming Jiang, and Joaquin Delgado. The author thanks Anirban DasGupta, Ravi Kumar, Lihong Li, and Martin Zinkevich for helpful discussions. We also thank the anonymous reviewers for their insightful comments, and for bringing the exponential clocks method to our attention.

Author information

Authors and Affiliations

Yahoo!, 701 1st Avenue, Sunnyvale, CA, 94089, USA
Kevin J. Lang

Authors

Kevin J. Lang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin J. Lang.

Additional information

This paper is an extended version of the conference paper [13].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lang, K.J. Practical Algorithms for Generating a Random Ordering of the Elements of a Weighted Set. Theory Comput Syst 54, 659–688 (2014). https://doi.org/10.1007/s00224-013-9496-6

Download citation

Published: 31 August 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s00224-013-9496-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Practical Algorithms for Generating a Random Ordering of the Elements of a Weighted Set

Abstract

Access this article

Similar content being viewed by others

Analysis of consensus sorting via the cycle metric

Sorting by Swaps with Noisy Comparisons

Analysis of Consensus Sorting via the Cycle Metric

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Practical Algorithms for Generating a Random Ordering of the Elements of a Weighted Set

Abstract

Access this article

Similar content being viewed by others

Analysis of consensus sorting via the cycle metric

Sorting by Swaps with Noisy Comparisons

Analysis of Consensus Sorting via the Cycle Metric

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation