Abstract
Suffix trees and suffix arrays are fundamental full-text index data structures to solve problems occurring in string processing. Since suffix trees and suffix arrays have different capabilities, some problems are solved more efficiently using suffix trees and others are solved more efficiently using suffix arrays. We consider efficient index data structures with the capabilities of both suffix trees and suffix arrays without requiring much space. When the size of an alphabet is small, enhanced suffix arrays are such index data structures. However, when the size of an alphabet is large, enhanced suffix arrays lose the power of suffix trees. Pattern searching in an enhanced suffix array takes O(m|Σ|) time while pattern searching in a suffix tree takes O(mlog |Σ|) time where m is the length of a pattern and Σ is an alphabet.
In this paper, we present linearized suffix trees which are efficient index data structures with the capabilities of both suffix trees and suffix arrays even when the size of an alphabet is large. A linearized suffix tree has all the functionalities of the enhanced suffix array and supports the pattern search in O(mlog |Σ|) time. In a different point of view, it can be considered a practical implementation of the suffix tree supporting O(mlog |Σ|)-time pattern search.
In addition, we also present two efficient algorithms for computing suffix links on the enhanced suffix array and the linearized suffix tree. These are the first algorithms that run in O(n) time without using the range minima query. Our experimental results show that our algorithms are faster than the previous algorithms.
Similar content being viewed by others
References
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2, 53–86 (2004)
Abouelhoda, M., Ohlebusch, E., Kurtz, S.: Optimal exact string matching based on suffix arrays. In: Symposium on String Processing and Information Retrieval, pp. 31–43 (2002)
Aho, A., Hopcroft, J., Ullman, J.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
Burkhardt, S., Kärkkäinen, J.: Fast lightweight suffix array construction and checking. In: Symposium on Combinatorial Pattern Matching, pp. 55–69 (2003)
Chen, M.T., Seiferas, J.: Efficient and elegant subword tree construction. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System Sciences, pp. 97–107. Springer, Berlin (1985)
Clark, D., Munro, I.: Efficient suffix trees on secondary storage. In: SODA, pp. 383–391 (1996)
Colussi, L., Col, A.: A time and space efficient data structure for string searching on large texts. IPL 58(5), 217–222 (1996)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
Crauser, A., Ferragina, P.: A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica 32, 1–35 (2002)
Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. In: Workshop on Algorithm Engineering and Experiments (2005)
Farach, M.: Optimal suffix tree construction with large alphabets. In: IEEE Symposium on Foundations of Computer Science, pp. 137–143 (1997)
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. Assoc. Comput. Mach. 47, 987–1011 (2000)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: IEEE Symposium on Foundations of Computer Science, pp. 390–398 (2001)
Giegerich, R., Kurtz, S.: A comparison of imperative and purely functional suffix tree construction. Sci. Comput. Program. 25, 187–218 (1995)
Giegerich, R., Kurtz, S.: From Ukkonen to McCreight and Weiner: a unifying view of linear-time suffix tree construction. Algorithmica 19, 331–353 (1997)
Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: Pat trees and pat arrays. In: Frakes, W.B., Baeza-Yates, R.A. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 66–82. Prentice-Hall, Englewood Cliffs (1992)
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: ACM Symposium on Theory of Computing, pp. 397–406 (2000)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)
Hon, W.K., Sadakane, K., Sung, W.K.: Breaking a time-and-space barrier in constructing full-text indices. In: IEEE Symposium on Foundations of Computer Science, pp. 251–260 (2003)
Kärkkäinen, J.: Suffix cactus: a cross between suffix tree and suffix array. In: Symposium on Combinatorial Pattern Matching, pp. 191–204 (1995)
Kärkkäinen, J., Sanders, P.: Simpler linear work suffix array construction. In: International Colloqium on Automata Languages and Programming, pp. 943–955 (2003)
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Symposium on Combinatorial Pattern Matching, pp. 181–192 (2001)
Kim, D.K., Jo, J., Park, H.: A fast algorithm for constructing suffix arrays for fixed-size alphabets. In: Workshop on Efficient and Experimental Algorithms, pp. 301–314 (2004)
Kim, D.K., Park, K.: Linear-time construction of two-dimensional suffix trees. In: International Colloqium on Automata Languages and Programming, pp. 463–472 (1999)
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Symposium on Combinatorial Pattern Matching, pp. 186–199 (2003)
Ko, P., Aluru, S.: Space-efficient linear time construction of suffix arrays. In: Symposium on Combinatorial Pattern Matching, pp. 200–210 (2003)
Kurtz, S.: Reducing the space requirement of suffix trees. Softw. Pract. Experience 29, 1149–1171 (1999)
Larsson, N.J., Sadakane, K.: Faster suffix sorting. Technical report No. LU-CS-TR:99-214, Department of Computer Science, Lund University, Sweden (1999)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22, 935–938 (1993)
Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 33–50 (2004)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 23, 262–272 (1976)
Munro, J.I., Raman, V., Rao, S.S.: Space efficient suffix trees. J. Algorithms 39, 205–222 (2001)
Sadakane, K.: Compressed suffix trees with full functionality. Theory Comput. Syst. (2007, in press)
Schürmann, K., Stoye, J.: An incomplex algorithm for fast suffix array construction. Softw. Pract. Exp. 37(3), 309–329 (2007)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, D.K., Kim, M. & Park, H. Linearized Suffix Tree: an Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays. Algorithmica 52, 350–377 (2008). https://doi.org/10.1007/s00453-007-9061-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-007-9061-2