Skip to main content

A New Compressed Suffix Tree Supporting Fast Search and Its Construction Algorithm Using Optimal Working Space

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3537))

Abstract

The compressed suffix array and the compressed suffix tree for a given string S are full-text index data structures occupying O(nlog|Σ|) bits where n is the length of S and Σ is the alphabet from which symbols of S are drawn. When they were first introduced, they were constructed from suffix arrays and suffix trees, which implies they were not constructed in optimal O(nlog|Σ|)-bit working space. Recently, several methods were developed for constructing compressed suffix arrays and compressed suffix trees in optimal working space. By these methods, one can construct compressed suffix trees supporting the pattern search in O(m′ |Σ|) time where m′ = m logε n, m is the length of a pattern, and logε n is the time to find the ith smallest suffix of S from the compressed suffix array for any fixed 0 < ε ≤ 1. However, compressed suffix trees supporting the pattern search in O(m′ log|Σ| ) time are not constructed by these methods.

In this paper, we present a new compressed suffix tree supporting O(m′ log|Σ|)-time pattern search and its construction algorithm using optimal working space. To obtain this result, we developed a new succinct representation of the suffix trees, which is different from the classic succinct representation of parentheses encoding of the suffix trees. Our succinct representation technique can be generally applicable to succinct representation of other search trees.

This work was supported by Korea Research Foundation grant KRF-2003-03-D00343.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. of Discrete Algorithms, 53–86 (2004)

    Google Scholar 

  2. Abouelhoda, M., Ohlebusch, E., Kurtz, S.: Optimal exact string matching based on suffix arrays. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 31–43. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  3. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  4. Elias, P.: Efficient storage and retrieval by content and address of static files. J. Assoc. Comput. Mach. 21, 246–260 (1974)

    MATH  MathSciNet  Google Scholar 

  5. Elias, P.: Universal codeword sets and representation of the integers. IEEE. Trans. Inform. Theory 21, 194–203 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  6. Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. Assoc. Comput. Mach. 47, 987–1011 (2000)

    MATH  MathSciNet  Google Scholar 

  7. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398 (2001)

    Google Scholar 

  8. Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: SODA, pp. 269–278 (2001)

    Google Scholar 

  9. Ferragina, P., Manzini, G., Makinen, V., Navarro, G.: An Alphabet-Friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: Pat trees and pat arrays. In: Frakes, W.B., Baeza-Yates, R.A. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 66–82. Prentice Hall, Englewood Cliffs (1992)

    Google Scholar 

  11. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 841–850 (2003)

    Google Scholar 

  12. Grossi, R., Gupta, A., Vitter, J.S.: When indexing equals compression: Experiments with compressing suffix arrays and applications. In: SODA (2004)

    Google Scholar 

  13. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: STOC, pp. 397–406 (2000)

    Google Scholar 

  14. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge Univ. Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  15. Hon, W.K., Sadakane, K.: Space-economical algorithms for finding maximal unique matches. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 144–152. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  16. Hon, W.K., Sadakane, K., Sung, W.K.: Breaking a time-and-space barrier in constructing full-text indices. In: FOCS, pp. 251–260 (2003)

    Google Scholar 

  17. Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longestcommon- prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  18. Kärkkäinen, J., Sanders, P.: Simpler linear work suffix array construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  19. Kim, D.K., Jo, J., Park, H.: A fast algorithm for constructing suffix arrays for fixed-size alphabets. In: Ribeiro, C.C., Martins, S.L. (eds.) WEA 2004. LNCS, vol. 3059, pp. 301–314. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  20. Kim, D.K., Jeon, J.E., Park, H.: An efficient index data structre with the capabilities of suffix trees and suffix arrays for alphabets of non-negligible size. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 138–149. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  21. Kim, D.K., Kim, M., Park, H.: Linearized suffix tree: an efficient index data structre with the capabilities of suffix trees and suffix arrays (manuscript, 2004)

    Google Scholar 

  22. Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  23. Ko, P., Aluru, S.: Space-efficient linear time construction of suffix arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  24. Lam, T.K., Sadakane, K., Sung, W.K., Yiu, S.M.: A space and time efficient algorithm for constructing compressed suffix arrays. In: Ibarra, O.H., Zhang, L. (eds.) COCOON 2002. LNCS, vol. 2387, pp. 401–410. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  25. Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22, 935–938 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  26. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 23, 262–272 (1976)

    MATH  MathSciNet  Google Scholar 

  27. Munro, J.I., Raman, V.: Succinct representation of balanced parentheses and static trees. SIAM J. on Comput. 31, 762–776 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  28. Munro, J.I., Raman, V., Rao, S.S.: Space efficient suffix trees. J. of Algorithms 39, 205–222 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  29. Sadakane, K.: Succinct representations of lcp Information and improvements in the compressed suffix arrays. In: SODA, pp. 225–232 (2002)

    Google Scholar 

  30. Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. of Algorithms 48, 294–313 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  31. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  32. Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, D.K., Park, H. (2005). A New Compressed Suffix Tree Supporting Fast Search and Its Construction Algorithm Using Optimal Working Space. In: Apostolico, A., Crochemore, M., Park, K. (eds) Combinatorial Pattern Matching. CPM 2005. Lecture Notes in Computer Science, vol 3537. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11496656_4

Download citation

  • DOI: https://doi.org/10.1007/11496656_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26201-5

  • Online ISBN: 978-3-540-31562-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics