Skip to main content

Sparse and Truncated Suffix Trees on Variable-Length Codes

  • Conference paper
Combinatorial Pattern Matching (CPM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6661))

Included in the following conference series:

Abstract

The sparse suffix trees (SST), introduced by (Kärkkäinen and Ukkonen, COCOON 1996), is the suffix tree for a subset of all suffixes of an input text T of length n. In this paper, we study a special case that an input string is a sequence of k codewords drawn from a regular prefix code Δ ⊆ Σ +  recognized by a finite automaton, and index points locate on the code boundaries. In this case, we present an online algorithm that constructs the sparse suffix tree for an input string T on any variable-length regular prefix code, called the code suffix tree (CST), in O(n + m) time and O(k) additional space for a fixed base alphabet Σ, where m is the size of an automaton for Δ. Furthermore, we present a modified algorithm for ℓ-truncated version of code suffix trees that runs in the same time and space complexities. Hence, these results generalize the previous results (Inenaga and Takeda, CPM 2006) for word suffix trees and (Na, Apostolico, Iliopoulos, and Park, Theor. Comp. Sci., 304, 2003) for truncated suffix trees on arbitrary variable-length regular prefix codes, such as Huffman codes and multi-byte codes (e.g. UTF-8).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. of Discrete Algorithms 2(1), 53–86 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Amir, A., Chencinski, E., Iliopoulos, C.S., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 188–199. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Andersson, A., Larsson, N.J., Swanson, K.: Suffix trees on words. Algorithmica 23(3), 246–260 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  4. Crochemore, M., Rytter, W.: Jewels of Stringology: Text Algorithms (2002)

    Google Scholar 

  5. Ferragina, P., Fischer, J.: Suffix arrays on words. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 328–339. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Gusfield, D.: Algorithms on Strings, Trees, and Sequences, – Computer Science and Computational Biology, Cambridge (1997)

    Google Scholar 

  7. IETF, UTF-8, a transformation format of ISO 10646, RFC 3629 (2003), http://tools.ietf.org/html/rfc3629

  8. Inenaga, S., Takeda, M.: On-line linear-time construction of word suffix trees. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 60–71. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Inenaga, S., Takeda, M.: Sparse directed acyclic word graphs. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 61–73. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Inenaga, S., Takeda, M.: Sparse compact directed acyclic word graphs. In: Holub, J., Zdarek, J. (eds.) Proc. PSC 2006, pp. 197–211 (2006)

    Google Scholar 

  11. Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  12. Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  13. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23, 262–272 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  14. Na, J.C., Apostolico, A., Iliopoulos, C.S., Park, K.: Truncated suffix trees and their application to data compression. Theoretical Computer Science 304, 87–101 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  15. Takeda, M., Miyamoto, S., Kida, T., Shinohara, A., Fukamachi, S., Shinohara, T., Arikawa, S.: Processing text files as is: Pattern matching over compressed texts, multi-byte character texts, and semi-structured texts. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, p. 170. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  16. Uemura, T., Arimura, H.: A linear-time off-line construction of property suffix trees. IEICE Trans. Inf. & Syst. J91-D(3), 595–607 (2008) (in Japanese). An English version appears in Chapter 4, T. Uemura, Efficient Construction of Constrained Suffix Trees, Ph.D thesis, IST, Hokkaido Univ. (February 2011) (submitting)

    Google Scholar 

  17. Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Uemura, T., Arimura, H. (2011). Sparse and Truncated Suffix Trees on Variable-Length Codes. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21458-5_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21457-8

  • Online ISBN: 978-3-642-21458-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics