Skip to main content

Upper Tail Analysis of Bucket Sort and Random Tries

  • Conference paper
  • First Online:
Algorithms and Complexity (CIAC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12701))

Included in the following conference series:

Abstract

Bucket Sort is known to run in expected linear time when the input keys are distributed independently and uniformly at random in the interval [0, 1). The analysis holds even when a quadratic time algorithm is used to sort the keys in each bucket. We show how to obtain linear time guarantees on the running time of Bucket Sort that hold with very high probability. Specifically, we investigate the asymptotic behavior of the exponent in the upper tail probability of the running time of Bucket Sort. We consider large additive deviations from the expectation, of the form cn for large enough (constant) c, where n is the number of keys that are sorted.

Our analysis shows a profound difference between variants of Bucket Sort that use a quadratic time algorithm within each bucket and variants that use a \(\varTheta (b\log b)\) time algorithm for sorting b keys in a bucket. When a quadratic time algorithm is used to sort the keys in a bucket, the probability that Bucket Sort takes cn more time than expected is exponential in \(\varTheta (\sqrt{n}\log n)\). When a \(\varTheta (b\log b)\) algorithm is used to sort the keys in a bucket, the exponent becomes \(\varTheta (n)\). We prove this latter theorem by showing an upper bound on the tail of a random variable defined on tries, a result which we believe is of independent interest. This result also enables us to analyze the upper tail probability of a well-studied trie parameter, the external path length, and show that the probability that it deviates from its expected value by an additive factor of cn is exponential in \(\varTheta (n)\).

This research was supported by a grant from the United States-Israel Binational Science Foundation (BSF), Jerusalem, Israel, and the United States National Science Foundation (NSF).

A full version of this paper can be found at [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Throughout the paper, \(\ln x\) denotes the natural logarithm of x and \(\log x\) denotes the logarithm of base 2 of x.

  2. 2.

    One should not confuse this analysis with concentration bounds that address small deviations from the expectation.

  3. 3.

    The threshold C depends on: (1) the constant that appears in the sorting algorithm used within each bucket, and (2) the constant that appears in the expected running time of \(b^2\)-Bucket Sort.

  4. 4.

    Interestingly, the sum of squares of bin occupancies, i.e., \(f(\mathbf {b})\), also appears in the FKS perfect hashing construction [8].

  5. 5.

    Formally, T(L) may contain a subset of these n nodes. If a node \(v_j\) at depth \(\log n\) is not chosen by any string, then define \(C_j=0\).

  6. 6.

    Note that RVs \(\left\{ \varDelta \right\} _i\) are not independent and probably not even negatively associated. Hence, standard concentration bounds do not apply to \(\sum \varDelta _i\).

References

  1. Bercea, I.O., Even, G.: Upper tail analysis of bucket sort and random tries. CoRR abs/2002.10499 (2020). https://arxiv.org/abs/2002.10499

  2. Clément, J., Flajolet, P., Vallée, B.: Dynamical sources in information theory: a general analysis of trie structures. Algorithmica 29(1–2), 307–369 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  3. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  4. Devroye, L.: Lecture Notes on Bucket Algorithms, vol. 12. Birkhäuser Boston (1986)

    Google Scholar 

  5. Doerr, B.: Probabilistic tools for the analysis of randomized optimization heuristics. CoRR abs/1801.06733 (2018). http://arxiv.org/abs/1801.06733

  6. Dubhashi, D.P., Panconesi, A.: Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, Cambridge (2009)

    Book  MATH  Google Scholar 

  7. Fill, J.A., Janson, S.: Quicksort asymptotics. J. Algorithms 44(1), 4–28 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with \(o(1)\) worst case access time. In: 23rd Annual Symposium on Foundations of Computer Science, pp. 165–169. IEEE (1982)

    Google Scholar 

  9. Jacquet, P., Regnier, M.: Normal limiting distribution for the size and the external path length of tries (1988)

    Google Scholar 

  10. Janson, S.: On the tails of the limiting quicksort distribution. Electron. Commun. Prob. 20 (2015)

    Google Scholar 

  11. Janson, S.: Tail bounds for sums of geometric and exponential variables. Stat. Prob. Lett. 135, 1–6 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kirschenhofer, P., Prodinger, H., Szpankowski, W.: On the variance of the external path length in a symmetric digital trie. Discrete Appl. Math. 25(1–2), 129–143 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  13. Knuth, D.E.: The Art of Computer Programming, vol. III, 2nd edn. Addison-Wesley, Boston (1998)

    MATH  Google Scholar 

  14. Mahmoud, H., Flajolet, P., Jacquet, P., Régnier, M.: Analytic variations on bucket selection and sorting. Acta Informatica 36(9–10), 735–760 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  15. Mahmoud, H.M., Lueker, G.S.: Evolution of Random Search Trees, vol. 200. Wiley, New York (1992)

    Google Scholar 

  16. McDiarmid, C., Hayward, R.: Large deviations for quicksort. J. Algorithms 21(3), 476–507 (1996). https://doi.org/10.1006/jagm.1996.0055

    Article  MathSciNet  MATH  Google Scholar 

  17. Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press, Cambridge (2017)

    MATH  Google Scholar 

  18. Régnier, M.: A limiting distribution for quicksort. RAIRO-Theoretical Inform. Appl.-Informatique Théorique et Appl. 23(3), 335–343 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  19. Sanders, P., Mehlhorn, K., Dietzfelbinger, M., Dementiev, R.: Sorting and selection. In: Sequential and Parallel Algorithms and Data Structures, pp. 153–210. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25209-0_5

    Chapter  MATH  Google Scholar 

  20. Sedgewick, R., Flajolet, P.: An Introduction to the Analysis of Algorithms. Pearson Education India, Chennai (2013)

    MATH  Google Scholar 

  21. Seidel, R.: Data-specific analysis of string sorting. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1278–1286. Society for Industrial and Applied Mathematics (2010)

    Google Scholar 

  22. Szpankowski, W.: Average Case Analysis of Algorithms on Sequences, vol. 50. John Wiley & Sons, New York (2011)

    Google Scholar 

  23. Vitter, J.S., Flajolet, P.: Average-case analysis of algorithms and data structures. In: Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, pp. 431–524 (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioana O. Bercea .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bercea, I.O., Even, G. (2021). Upper Tail Analysis of Bucket Sort and Random Tries. In: Calamoneri, T., Corò, F. (eds) Algorithms and Complexity. CIAC 2021. Lecture Notes in Computer Science(), vol 12701. Springer, Cham. https://doi.org/10.1007/978-3-030-75242-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75242-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75241-5

  • Online ISBN: 978-3-030-75242-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics