Abstract
In 1990, Manber and Myers proposed suffix arrays as a space-saving alternative to suffix trees and described the first algorithms for suffix array construction and use. Since that time, and especially in the last few years, suffix array construction algorithms have proliferated in bewildering abundance. This survey paper attempts to provide simple high-level descriptions of these numerous algorithms that highlight both their distinctive features and their commonalities, while avoiding as much as possible the complexities of implementation details. New hybrid algorithms are also described. We provide comparisons of the algorithms' worst-case time complexity and use of additional space, together with results of recent experimental test runs on many of their implementations.
- Abouelhoda, M. I., Kurtz, S., and Ohlebusch, E. 2004. Replacing suffix trees with suffix arrays. J. Disc. Algor. 2, 1, 53--86. Google ScholarDigital Library
- Apostolico, A. 1985. The myriad virtues of subword trees. In Combinatorial Algorithms on Words. NATO ASI Series F12. Springer-Verlag, Berlin, Germany, 85--96.Google Scholar
- Baron, D. and Bresler, Y. 2005. Antisequential suffix sorting for BWT-based data compression. IEEE Trans. Comput. 54, 4 (Apr.), 385--397. Google ScholarDigital Library
- Bentley, J. L. and McIlroy, M. D. 1993. Engineering a sort function. Softw. Pract. Exper. 23, 11, 1249--1265. Google ScholarDigital Library
- Bentley, J. L. and Sedgewick, R. 1997. Fast algorithms for sorting and searching strings. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms (New Orleans, LA). ACM, New York, 360--369. Google ScholarDigital Library
- Burkhardt, S. and Kärkkäinen, J. 2003. Fast lightweight suffix array construction and checking. In Proceedings of the 14th Annual Symposium CPM 2003, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science, vol. 2676. Springer-Verlag, Berlin, Germany, 55--69. Google ScholarDigital Library
- Burrows, M. and Wheeler, D. J. 1994. A block sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation, Palo Alto, CA.Google Scholar
- Crauser, A. and Ferragina, P. 2002. A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica 32, 1--35.Google ScholarDigital Library
- Farach, M. 1997. Optimal suffix tree construction for large alphabets. In Proceedings of the 38th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society, Los Alamitos, CA, 137--143. Google ScholarDigital Library
- Ferragina, P. and Grossi, R. 1999. The string b-tree: a new data structure for search in external memory and its applications. J. ACM 46, 2, 236--280. Google ScholarDigital Library
- Grossi, R. and Vitter, J. S. 2005. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35, 2, 378--407. Google ScholarDigital Library
- Hart, M. 1997. Project Gutenberg. http://www.gutenberg.net.Google Scholar
- Hon, W., Sadakane, K., and Sung, W. 2003. Breaking a time-and-space barrier in constructing full-text indices. In Proceedings of the 44th IEEE Symposium on Foundations of Computer Science (FOCS'03). IEEE Computer Society Press, Los Alamitos, CA, 251--260. Google ScholarDigital Library
- Itoh, H. and Tanaka, H. 1999. An efficient method for in memory construction of suffix arrays. In Proceedings of the 6th Symposium on String Processing and Information Retrieval (Cancun, Mexico). IEEE Computer Society, Los Alamitos, CA, 81--88. Google ScholarDigital Library
- Kärkkäinen, J. and Sanders, P. 2003. Simple linear work suffix array construction. In Proceedings of the 30th International Colloquium Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 2971. Springer-Verlag, Berlin, Germany, 943--955. Google ScholarDigital Library
- Kärkkäinen, J., Sanders, P., and Burkhardt, S. 2006. Linear work suffix array construction. Journal of the ACM 53, 6 (Nov.), 918--936. Google ScholarDigital Library
- Karp, R. M., Miller, R. E., and Rosenberg, A. L. 1972. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the 4th Annual ACM Symposium on Theory of Computing (Denver, CO). ACM, New York, 125--136. Google ScholarDigital Library
- Kasai, T., Lee, G., Arimura, H., Arikawa, S., and Park, K. 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th Annual Symposium (CPM 2001). Lecture Notes in Computer Science, vol. 2089. Springer-Verlag, Berlin, Germany, 181--192. Google ScholarDigital Library
- Khmelev, D. V. 2003. Program suffsort version 0.1.6. http://www.math.toronto.edu/dkhmelev/PROGS/tacu/suffsort-eng.html.Google Scholar
- Kim, D. K., Jo, J., and Park, H. 2004. A fast algorithm for constructing suffix arrays for fixed-size alphabets. In Proceedings of the 3rd Workshop on Experimental and Efficient Algorithms (WEA 2004), C. C. Ribeiro and S. L. Martins, Eds. Springer-Verlag, Berlin, Germany, 301--314.Google ScholarCross Ref
- Kim, D. K., Sim, J. S., Park, H., and Park, K. 2003. Linear-time construction of suffix arrays. In Proceedings of the 14th Annual Symposium Combinatorial Pattern Matching, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science, vol. 2676. Springer-Verlag, Berlin, Germany, 186--199. Google ScholarDigital Library
- Kim, D. K., Sim, J. S., Park, H., and Park, K. 2005. Constructing suffix arrays in linear time. J. Discrete Algorithms 3, 126--142.Google ScholarCross Ref
- Ko, P. 2006. Linear time suffix array. http://www.public.iastate.edu/~kopang/progRelease/homepage.html.Google Scholar
- Ko, P. and Aluru, S. 2003. Space efficient linear time construction of suffix arrays. In Proceedings of the 14th Annual Symposium CPM 2003, R. Baeza-Yates, E. Chávez, and M. Crochemore, Eds. Lecture Notes in Computer Science, vol. 2676. Springer-Verlag, Berlin, Germany, 200--210. Google ScholarDigital Library
- Ko, P. and Aluru, S. 2005. Space efficient linear time construction of suffix arrays. J. Disc. Algor. 3, 143--156.Google ScholarCross Ref
- Kurtz, S. 1999. Reducing the space requirement of suffix trees. Softw. Pract. Exper. 29, 13, 1149--1171. Google ScholarDigital Library
- Larsson, J. N. and Sadakane, K. 1999. Faster suffix sorting. Tech. Rep. LU-CS-TR:99-214 {LUNFD6/(NFCS-3140)/1-20/(1999)}, Department of Computer Science, Lund University, Sweden.Google Scholar
- Lee, S. and Park, K. 2004. Efficient implementations of suffix array construction algorithms. In AWOCA 2004: Proceedings of the 15th Australasian Workshop on Combinatorial Algorithms, S. Hong, Ed. 64--72.Google Scholar
- Malyshev, D. 2006. DARK the universal archiver based on BWT-DC scheme. http://darchiver.narod.ru/.Google Scholar
- Manber, U. and Myers, G. W. 1990. Suffix arrays: A new method for on-line string searches. In Proceedings of the 1st ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, 319--327. Google ScholarDigital Library
- Manber, U. and Myers, G. W. 1993. Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948. Google ScholarDigital Library
- Maniscalco, M. A. 2005. MSufSort. http://www.michael-maniscalco.com/msufsort.htm.Google Scholar
- Maniscalco, M. A. and Puglisi, S. J. 2006. Faster lightweight suffix array construction. In Proceedings of 17th Australasian Workshop on Combinatorial Algorithms, J. Ryan and Dafik, Eds. Univ. Ballavat, Ballavat, Victoria, Australia, 16--29.Google Scholar
- Maniscalco, M. A. and Puglisi, S. J. 2007. An efficient, versatile approach to suffix sorting. ACM J. Experiment. Algor. To appear. Google ScholarDigital Library
- Manzini, G. 2004. Two space saving tricks for linear time LCP computation. In Proceedings of 9th Scandinavian Workshop on Algorithm Theory (SWAT '04), T. Hagerup and J. Katajainen, Eds. Lecture Notes in Computer Science, vol. 3111. Springer-Verlag, Berlin, Germany, 372--383.Google ScholarCross Ref
- Manzini, G. and Ferragina, P. 2004. Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 33--50. Google ScholarDigital Library
- McIlroy, M. D. 1997. ssort.c. http://cm.bell-labs.com/cm/cs/who/doug/source.html.Google Scholar
- McIlroy, P. M., Bostic, K., and McIlroy, M. D. 1993. Engineering radix sort. Comput. Syst. 6, 1, 5--27.Google Scholar
- Mori, Y. 2006. DivSufSort. http://www.homepage3.nifty.com/wpage/software/libdivsufsort.html.Google Scholar
- Munro, J. I. 1996. Tables. In Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). Lecture Notes in Computer Science, vol. 1180. Springer-Verlag, London, UK, 37--42. Google ScholarDigital Library
- Na, J. C. 2005. Linear-time construction of compressed suffix arrays using O(nlogn)-bit working space for large alphabets. In Proceedings of the 16th Annual Symposium Combinatorial Pattern Matching, A. Apostolico, M. Crochemore, and K. Park, Eds. Lecture Notes in Computer Science, vol. 3537. Springer-Verlag, Berlin, Germany, 57--67. Google ScholarDigital Library
- Navarro, G. and Mäkinen, V. 2007. Compressed full-text indexes. ACM Comput. Surv. 39, 1 (Apr.), Article 2. Google ScholarDigital Library
- Puglisi, S. J., Smyth, W. F., and Turpin, A. H. 2005. The performance of linear time suffix sorting algorithms. In Proceedings of the IEEE Data Compression Conference, M. Cohn and J. Storer, Eds. IEEE Computer Society Press, Los Alamitos, CA, 358--368. Google ScholarDigital Library
- Sadakane, K. 1998. A fast algorithm for making suffix arrays and for Burrows-Wheeler transformation. In DCC: Data Compression Conference. IEEE Computer Society Press, Los Alamitos, CA, 129--138. Google ScholarDigital Library
- Schürmann, K. and Stoye, J. 2005. An incomplex algorithm for fast suffix array construction. In Proceedings of the 7th Workshop on Algorithm Engineering and Experiments (ALENEX05). SIAM, 77--85.Google Scholar
- Seward, J. 2000. On the performance of BWT sorting algroithms. In DCC: Data Compression Conference. IEEE Computer Society Press, Los Alamitos, CA, 173--182. Google ScholarDigital Library
- Sim, J. S., Kim, D. K., Park, H., and Park, K. 2003. Linear-time search in suffix arrays. In Proceedings of the 14th Australasian Workshop on Combinatorial Algorithms, M. Miller and K. Park, Eds. (Seoul, Korea), 139--146.Google Scholar
- Sinha, R. and Zobel, J. 2004. Cache-conscious sorting of large sets of strings with dynamic tries. ACM J. Exper. Algor. 9. Google ScholarDigital Library
- Smyth, B. 2003. Computing Patterns in Strings. Pearson Addison-Wesley, Essex, England.Google Scholar
Index Terms
- A taxonomy of suffix array construction algorithms
Recommendations
The suffix binary search tree and suffix AVL tree
Suffix trees and suffix arrays are classical data structures that are used to represent the set of suffixes of a given string, and thereby facilitate the efficient solution of various string processing problems--in particular on-line string searching. ...
A Suffix Tree Or Not a Suffix Tree?
Combinatorial AlgorithmsAbstractIn this paper we study the structure of suffix trees. Given an unlabeled tree on n nodes and suffix links of its internal nodes, we ask the question “Is a suffix tree?", i.e., is there a string S whose suffix tree has the same topological ...
An efficient, versatile approach to suffix sorting
Sorting the suffixes of a string into lexicographical order is a fundamental task in a number of contexts, most notably lossless compression (Burrows--Wheeler transformation) and text indexing (suffix arrays). Most approaches to suffix sorting produce a ...
Comments