Skip to main content
Log in

Linearized Suffix Tree: an Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Suffix trees and suffix arrays are fundamental full-text index data structures to solve problems occurring in string processing. Since suffix trees and suffix arrays have different capabilities, some problems are solved more efficiently using suffix trees and others are solved more efficiently using suffix arrays. We consider efficient index data structures with the capabilities of both suffix trees and suffix arrays without requiring much space. When the size of an alphabet is small, enhanced suffix arrays are such index data structures. However, when the size of an alphabet is large, enhanced suffix arrays lose the power of suffix trees. Pattern searching in an enhanced suffix array takes O(m|Σ|) time while pattern searching in a suffix tree takes O(mlog |Σ|) time where m is the length of a pattern and Σ is an alphabet.

In this paper, we present linearized suffix trees which are efficient index data structures with the capabilities of both suffix trees and suffix arrays even when the size of an alphabet is large. A linearized suffix tree has all the functionalities of the enhanced suffix array and supports the pattern search in O(mlog |Σ|) time. In a different point of view, it can be considered a practical implementation of the suffix tree supporting O(mlog |Σ|)-time pattern search.

In addition, we also present two efficient algorithms for computing suffix links on the enhanced suffix array and the linearized suffix tree. These are the first algorithms that run in O(n) time without using the range minima query. Our experimental results show that our algorithms are faster than the previous algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2, 53–86 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  2. Abouelhoda, M., Ohlebusch, E., Kurtz, S.: Optimal exact string matching based on suffix arrays. In: Symposium on String Processing and Information Retrieval, pp. 31–43 (2002)

  3. Aho, A., Hopcroft, J., Ullman, J.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)

    MATH  Google Scholar 

  4. Burkhardt, S., Kärkkäinen, J.: Fast lightweight suffix array construction and checking. In: Symposium on Combinatorial Pattern Matching, pp. 55–69 (2003)

  5. Chen, M.T., Seiferas, J.: Efficient and elegant subword tree construction. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO ASI Series F: Computer and System Sciences, pp. 97–107. Springer, Berlin (1985)

    Google Scholar 

  6. Clark, D., Munro, I.: Efficient suffix trees on secondary storage. In: SODA, pp. 383–391 (1996)

  7. Colussi, L., Col, A.: A time and space efficient data structure for string searching on large texts. IPL 58(5), 217–222 (1996)

    Article  MATH  Google Scholar 

  8. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  9. Crauser, A., Ferragina, P.: A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica 32, 1–35 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  10. Dementiev, R., Kärkkäinen, J., Mehnert, J., Sanders, P.: Better external memory suffix array construction. In: Workshop on Algorithm Engineering and Experiments (2005)

  11. Farach, M.: Optimal suffix tree construction with large alphabets. In: IEEE Symposium on Foundations of Computer Science, pp. 137–143 (1997)

  12. Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. Assoc. Comput. Mach. 47, 987–1011 (2000)

    MATH  MathSciNet  Google Scholar 

  13. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: IEEE Symposium on Foundations of Computer Science, pp. 390–398 (2001)

  14. Giegerich, R., Kurtz, S.: A comparison of imperative and purely functional suffix tree construction. Sci. Comput. Program. 25, 187–218 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  15. Giegerich, R., Kurtz, S.: From Ukkonen to McCreight and Weiner: a unifying view of linear-time suffix tree construction. Algorithmica 19, 331–353 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  16. Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: Pat trees and pat arrays. In: Frakes, W.B., Baeza-Yates, R.A. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 66–82. Prentice-Hall, Englewood Cliffs (1992)

    Google Scholar 

  17. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: ACM Symposium on Theory of Computing, pp. 397–406 (2000)

  18. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  19. Hon, W.K., Sadakane, K., Sung, W.K.: Breaking a time-and-space barrier in constructing full-text indices. In: IEEE Symposium on Foundations of Computer Science, pp. 251–260 (2003)

  20. Kärkkäinen, J.: Suffix cactus: a cross between suffix tree and suffix array. In: Symposium on Combinatorial Pattern Matching, pp. 191–204 (1995)

  21. Kärkkäinen, J., Sanders, P.: Simpler linear work suffix array construction. In: International Colloqium on Automata Languages and Programming, pp. 943–955 (2003)

  22. Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Symposium on Combinatorial Pattern Matching, pp. 181–192 (2001)

  23. Kim, D.K., Jo, J., Park, H.: A fast algorithm for constructing suffix arrays for fixed-size alphabets. In: Workshop on Efficient and Experimental Algorithms, pp. 301–314 (2004)

  24. Kim, D.K., Park, K.: Linear-time construction of two-dimensional suffix trees. In: International Colloqium on Automata Languages and Programming, pp. 463–472 (1999)

  25. Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Symposium on Combinatorial Pattern Matching, pp. 186–199 (2003)

  26. Ko, P., Aluru, S.: Space-efficient linear time construction of suffix arrays. In: Symposium on Combinatorial Pattern Matching, pp. 200–210 (2003)

  27. Kurtz, S.: Reducing the space requirement of suffix trees. Softw. Pract. Experience 29, 1149–1171 (1999)

    Article  Google Scholar 

  28. Larsson, N.J., Sadakane, K.: Faster suffix sorting. Technical report No. LU-CS-TR:99-214, Department of Computer Science, Lund University, Sweden (1999)

  29. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22, 935–938 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  30. Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 33–50 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  31. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 23, 262–272 (1976)

    MATH  MathSciNet  Google Scholar 

  32. Munro, J.I., Raman, V., Rao, S.S.: Space efficient suffix trees. J. Algorithms 39, 205–222 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  33. Sadakane, K.: Compressed suffix trees with full functionality. Theory Comput. Syst. (2007, in press)

  34. Schürmann, K., Stoye, J.: An incomplex algorithm for fast suffix array construction. Softw. Pract. Exp. 37(3), 309–329 (2007)

    Article  Google Scholar 

  35. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  36. Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heejin Park.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, D.K., Kim, M. & Park, H. Linearized Suffix Tree: an Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays. Algorithmica 52, 350–377 (2008). https://doi.org/10.1007/s00453-007-9061-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-007-9061-2

Keywords

Navigation