Abstract
The sparse suffix trees (SST), introduced by (Kärkkäinen and Ukkonen, COCOON 1996), is the suffix tree for a subset of all suffixes of an input text T of length n. In this paper, we study a special case that an input string is a sequence of k codewords drawn from a regular prefix code Δ ⊆ Σ + recognized by a finite automaton, and index points locate on the code boundaries. In this case, we present an online algorithm that constructs the sparse suffix tree for an input string T on any variable-length regular prefix code, called the code suffix tree (CST), in O(n + m) time and O(k) additional space for a fixed base alphabet Σ, where m is the size of an automaton for Δ. Furthermore, we present a modified algorithm for ℓ-truncated version of code suffix trees that runs in the same time and space complexities. Hence, these results generalize the previous results (Inenaga and Takeda, CPM 2006) for word suffix trees and (Na, Apostolico, Iliopoulos, and Park, Theor. Comp. Sci., 304, 2003) for truncated suffix trees on arbitrary variable-length regular prefix codes, such as Huffman codes and multi-byte codes (e.g. UTF-8).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. of Discrete Algorithms 2(1), 53–86 (2004)
Amir, A., Chencinski, E., Iliopoulos, C.S., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 188–199. Springer, Heidelberg (2006)
Andersson, A., Larsson, N.J., Swanson, K.: Suffix trees on words. Algorithmica 23(3), 246–260 (1999)
Crochemore, M., Rytter, W.: Jewels of Stringology: Text Algorithms (2002)
Ferragina, P., Fischer, J.: Suffix arrays on words. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 328–339. Springer, Heidelberg (2007)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences, – Computer Science and Computational Biology, Cambridge (1997)
IETF, UTF-8, a transformation format of ISO 10646, RFC 3629 (2003), http://tools.ietf.org/html/rfc3629
Inenaga, S., Takeda, M.: On-line linear-time construction of word suffix trees. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 60–71. Springer, Heidelberg (2006)
Inenaga, S., Takeda, M.: Sparse directed acyclic word graphs. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 61–73. Springer, Heidelberg (2006)
Inenaga, S., Takeda, M.: Sparse compact directed acyclic word graphs. In: Holub, J., Zdarek, J. (eds.) Proc. PSC 2006, pp. 197–211 (2006)
Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)
Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23, 262–272 (1976)
Na, J.C., Apostolico, A., Iliopoulos, C.S., Park, K.: Truncated suffix trees and their application to data compression. Theoretical Computer Science 304, 87–101 (2003)
Takeda, M., Miyamoto, S., Kida, T., Shinohara, A., Fukamachi, S., Shinohara, T., Arikawa, S.: Processing text files as is: Pattern matching over compressed texts, multi-byte character texts, and semi-structured texts. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, p. 170. Springer, Heidelberg (2002)
Uemura, T., Arimura, H.: A linear-time off-line construction of property suffix trees. IEICE Trans. Inf. & Syst. J91-D(3), 595–607 (2008) (in Japanese). An English version appears in Chapter 4, T. Uemura, Efficient Construction of Constrained Suffix Trees, Ph.D thesis, IST, Hokkaido Univ. (February 2011) (submitting)
Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14(3), 249–260 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Uemura, T., Arimura, H. (2011). Sparse and Truncated Suffix Trees on Variable-Length Codes. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-21458-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21457-8
Online ISBN: 978-3-642-21458-5
eBook Packages: Computer ScienceComputer Science (R0)