Abstract
We give a randomized algorithm for sorting strings in external memory. For K binary strings comprising N words in total, our algorithm finds the sorted order and the longest common prefix sequence of the strings using \(O(\frac{K}{B}log_{M/B}(\frac{K}{M})log(\frac{N}{K}) + \frac{N}{B})\) I/Os. This bound is never worse than \(O(\frac{K}{B}log_{M/B}(\frac{K}{M})log log_{M/B}(\frac{K}{M}) + \frac{N}{B})\) I/Os, and improves on the (deterministic) algorithm of Arge et al. (On sorting strings in external memory, STOC ’97). The error probability of the algorithm can be chosen as O(N \(^{\rm -{\it c}}\)) for any positive constant c. The algorithm even works in the cache-oblivious model under the tall cache assumption, i.e,, assuming M > B 1 + ε for some ε > 0. An implication of our result is improved construction algorithms for external memory string dictionaries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, A., Vitter, J.S.: The Input/Output complexity of sorting and related problems. Communications of the ACM 31(9), 1116–1127 (1988)
Andersson, A., Hagerup, T., Nilsson, S., Raman, R.: Sorting in linear time? J. Comput. System Sci. 57(1), 74–93 (1998)
Andersson, A., Nilsson, S.: A new efficient radix sort. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS 1994), pp. 714–721. IEEE Comput. Soc. Press, Los Alamitos (1994)
Arge, L.: External memory data structures. In: Abello, J., Pardalos, P.M., Resende, M.G.C. (eds.) Handbook of Massive Data Sets, pp. 313–358. Kluwer Academic Publishers, Dordrecht (2002)
Arge, L., Bender, M.A., Demaine, E.D., Holland-Minkley, B., Munro, J.I.: Cache-oblivious priority queue and graph algorithm applications. In: ACM. (ed.) Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC 2002), pp. 268–276. ACM Press, New York (2002)
Arge, L., Brodal, G.S., Fagerberg, R.: Cache-oblivious data structures. In: Mehta, D., Sahni, S. (eds.) Handbook on Data Structures and Applications, CRC Press, Boca Raton (2005)
Arge, L., Ferragina, P., Grossi, R., Vitter, J.S.: On sorting strings in external memory (extended abstract). In: ACM (ed.) Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC 1997), pp. 540–548. ACM Press, New York (1997)
Bentley, J., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proc. 8th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 360–369 (1997)
Brodal, G.S.: Cache-oblivious algorithms and data structures. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, pp. 3–13. Springer, Heidelberg (2004)
Brodal, G.S., Fagerberg, R.: Cache oblivious distribution sweeping. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 426–438. Springer, Heidelberg (2002)
Brodal, G.S., Fagerberg, R.: On the limits of cache-obliviousness. In: Proc. 35th Annual ACM Symposium on Theory of Computing, pp. 307–315 (2003)
Brodal, G.S., Fagerberg, R.: Cache-oblivious string dictionaries. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2006) (to appear, 2006)
Demaine, E.D.: Cache-oblivious data structures and algorithms. In: Proc. EFF summer school on massive data sets. LNCS, Springer, Heidelberg (to appear)
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000)
Ferragina, P., Grossi, R.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM 46(2), 236–280 (1999)
Fredman, M.L., Willard, D.E.: Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. System Sci. 48(3), 533–551 (1994)
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache oblivious algorithms. In: 40th Annual IEEE Symposium on Foundations of Computer Science, pp. 285–298. IEEE Computer Society Press, Los Alamitos (1999)
Han, Y., Thorup, M.: Integer sorting in \(O(n\sqrt{\log\log n})\) expected time and linear space. In: Proceedings of the 43rd Annual Symposium on Foundations of Computer Science (FOCS 2002), pp. 135–144 (2002)
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)
Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of repeated patterns in strings, trees and arrays. In: Proceedings of the 4th Annual ACM Symposium on Theory of Computing (STOC 2072), pp. 125–136 (1972)
Meyer, U., Sanders, P., Sibeyn, J.F. (eds.): Algorithms for Memory Hierarchies. LNCS, vol. 2625. Springer, Berlin (2003)
Morrison, D.R.: PATRICIA - practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)
Vitter, J.S.: External memory algorithms and data structures: Dealing with MASSIVE data. ACM Computing Surveys 33(2), 209–271 (2001)
Vitter, J.S.: Geometric and spatial data structures in external memory. In: Mehta, D., Sahni, S. (eds.) Handbook on Data Structures and Applications, CRC Press, Boca Raton (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fagerberg, R., Pagh, A., Pagh, R. (2006). External String Sorting: Faster and Cache-Oblivious. In: Durand, B., Thomas, W. (eds) STACS 2006. STACS 2006. Lecture Notes in Computer Science, vol 3884. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11672142_4
Download citation
DOI: https://doi.org/10.1007/11672142_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32301-3
Online ISBN: 978-3-540-32288-7
eBook Packages: Computer ScienceComputer Science (R0)