Skip to main content

External String Sorting: Faster and Cache-Oblivious

  • Conference paper
Book cover STACS 2006 (STACS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3884))

Included in the following conference series:

Abstract

We give a randomized algorithm for sorting strings in external memory. For K binary strings comprising N words in total, our algorithm finds the sorted order and the longest common prefix sequence of the strings using \(O(\frac{K}{B}log_{M/B}(\frac{K}{M})log(\frac{N}{K}) + \frac{N}{B})\) I/Os. This bound is never worse than \(O(\frac{K}{B}log_{M/B}(\frac{K}{M})log log_{M/B}(\frac{K}{M}) + \frac{N}{B})\) I/Os, and improves on the (deterministic) algorithm of Arge et al. (On sorting strings in external memory, STOC ’97). The error probability of the algorithm can be chosen as O(N \(^{\rm -{\it c}}\)) for any positive constant c. The algorithm even works in the cache-oblivious model under the tall cache assumption, i.e,, assuming M > B 1 + ε for some ε > 0. An implication of our result is improved construction algorithms for external memory string dictionaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, A., Vitter, J.S.: The Input/Output complexity of sorting and related problems. Communications of the ACM 31(9), 1116–1127 (1988)

    Article  MathSciNet  Google Scholar 

  2. Andersson, A., Hagerup, T., Nilsson, S., Raman, R.: Sorting in linear time? J. Comput. System Sci. 57(1), 74–93 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  3. Andersson, A., Nilsson, S.: A new efficient radix sort. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS 1994), pp. 714–721. IEEE Comput. Soc. Press, Los Alamitos (1994)

    Chapter  Google Scholar 

  4. Arge, L.: External memory data structures. In: Abello, J., Pardalos, P.M., Resende, M.G.C. (eds.) Handbook of Massive Data Sets, pp. 313–358. Kluwer Academic Publishers, Dordrecht (2002)

    Chapter  Google Scholar 

  5. Arge, L., Bender, M.A., Demaine, E.D., Holland-Minkley, B., Munro, J.I.: Cache-oblivious priority queue and graph algorithm applications. In: ACM. (ed.) Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC 2002), pp. 268–276. ACM Press, New York (2002)

    Google Scholar 

  6. Arge, L., Brodal, G.S., Fagerberg, R.: Cache-oblivious data structures. In: Mehta, D., Sahni, S. (eds.) Handbook on Data Structures and Applications, CRC Press, Boca Raton (2005)

    Google Scholar 

  7. Arge, L., Ferragina, P., Grossi, R., Vitter, J.S.: On sorting strings in external memory (extended abstract). In: ACM (ed.) Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC 1997), pp. 540–548. ACM Press, New York (1997)

    Google Scholar 

  8. Bentley, J., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proc. 8th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 360–369 (1997)

    Google Scholar 

  9. Brodal, G.S.: Cache-oblivious algorithms and data structures. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 3–13. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Brodal, G.S., Fagerberg, R.: Cache oblivious distribution sweeping. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 426–438. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Brodal, G.S., Fagerberg, R.: On the limits of cache-obliviousness. In: Proc. 35th Annual ACM Symposium on Theory of Computing, pp. 307–315 (2003)

    Google Scholar 

  12. Brodal, G.S., Fagerberg, R.: Cache-oblivious string dictionaries. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2006) (to appear, 2006)

    Google Scholar 

  13. Demaine, E.D.: Cache-oblivious data structures and algorithms. In: Proc. EFF summer school on massive data sets. LNCS, Springer, Heidelberg (to appear)

    Google Scholar 

  14. Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  15. Ferragina, P., Grossi, R.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM 46(2), 236–280 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  16. Fredman, M.L., Willard, D.E.: Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. System Sci. 48(3), 533–551 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  17. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache oblivious algorithms. In: 40th Annual IEEE Symposium on Foundations of Computer Science, pp. 285–298. IEEE Computer Society Press, Los Alamitos (1999)

    Google Scholar 

  18. Han, Y., Thorup, M.: Integer sorting in \(O(n\sqrt{\log\log n})\) expected time and linear space. In: Proceedings of the 43rd Annual Symposium on Foundations of Computer Science (FOCS 2002), pp. 135–144 (2002)

    Google Scholar 

  19. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  20. Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of repeated patterns in strings, trees and arrays. In: Proceedings of the 4th Annual ACM Symposium on Theory of Computing (STOC 2072), pp. 125–136 (1972)

    Google Scholar 

  21. Meyer, U., Sanders, P., Sibeyn, J.F. (eds.): Algorithms for Memory Hierarchies. LNCS, vol. 2625. Springer, Berlin (2003)

    MATH  Google Scholar 

  22. Morrison, D.R.: PATRICIA - practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)

    Article  Google Scholar 

  23. Vitter, J.S.: External memory algorithms and data structures: Dealing with MASSIVE data. ACM Computing Surveys 33(2), 209–271 (2001)

    Article  Google Scholar 

  24. Vitter, J.S.: Geometric and spatial data structures in external memory. In: Mehta, D., Sahni, S. (eds.) Handbook on Data Structures and Applications, CRC Press, Boca Raton (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fagerberg, R., Pagh, A., Pagh, R. (2006). External String Sorting: Faster and Cache-Oblivious. In: Durand, B., Thomas, W. (eds) STACS 2006. STACS 2006. Lecture Notes in Computer Science, vol 3884. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11672142_4

Download citation

  • DOI: https://doi.org/10.1007/11672142_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32301-3

  • Online ISBN: 978-3-540-32288-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics