Suffix trees on words

Andersson, Arne; Larsson, N. Jesper; Swanson, Kurt

doi:10.1007/3-540-61258-0_9

Arne Andersson¹,
N. Jesper Larsson¹ &
Kurt Swanson¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1075))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

161 Accesses
9 Citations

Abstract

We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multi-character substrings or words. This word suffix tree represents only the m suffixes that start at word boundaries. These boundaries are determined by delimiters, whose definition depends on the application. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In general, construction cost is Ω(n) due to the need of scanning the entire input. In applications that require strict node ordering, an additional cost of sorting O(m′) characters arises, where m′ is the number of distinct words. In either case, this is a significant improvement over previous solutions.

Furthermore, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Andersson and S. Nilsson. Improved behaviour of tries by adaptive branching. Inf. Process. Lett., 46:295–300, 1993.
Article Google Scholar
A. Andersson and S. Nilsson. Faster searching in tries and quadtrees—an analysis of level compression. In Proc. 2^nd Annual European Symposium on Algorithms, pages 82–93. Springer Verlag, 1994.
Google Scholar
A. Andersson and S. Nilsson. Efficient implementation of suffix trees. Software-Practice and Experience, 25(2):129–141, 1995.
Google Scholar
A. Apostolico. The myriad virtues of subword trees. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, NATO ISI Series, pages 85–96. Springer-Verlag, 1985.
Google Scholar
R. Baeza-Yates and G. H. Gonnet. Efficient text searching of regular expressions. In Proceedings of the 16^th International Colloquium on Automata, Languages and Programming (ICALP'89), volume 372 of Lecture Notes in Computer Science, pages 46–62. Springer-Verlag, 1989.
Google Scholar
M. Dietzfelbinger, A. Karlin, K. Mehlhorn, F. Meyer auf der Heide, H. Rohnert, and R. E. Tarjan. Dynamic perfect hashing: Upper and lower bounds. SIAM Journal on Computing, 23:738–761, 1994.
Google Scholar
M. L. Fredman and D. E. Willard. Surpassing the information theoretic bound with fusion trees. Journal of Computer and System Sciences, 47:424–436, 1993.
Google Scholar
R. Giegerich and S. Kurtz. Suffix trees in the functional programming paradigm. In European Symposium on Programming (ESOP'94), volume 788 of Lecture Notes in Computer Science, pages 225–240. Springer-Verlag, 1994.
Google Scholar
G. H. Gonnet and R. Baeza-Yates. Handbook of Algorithms and Data Structures. Addison-Wesley, 1991. ISBN 0-201-41607-7.
Google Scholar
D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal of Computing, 13:338–355, 1984.
Google Scholar
U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM J. Comput., 22(5):935–948, Oct. 1993.
Article Google Scholar
E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM, 23:262–272, 1976.
Article Google Scholar
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, Sept. 1995.
Google Scholar
P. Weiner. Linear pattern matching algorithms. In Proceedings 14^th IEEE Symposium on Foundations of Computer Science (FOCS), pages 1–11, 1973.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Lund University, Box 118, S-221 00, Lund, Sweden
Arne Andersson, N. Jesper Larsson & Kurt Swanson

Authors

Arne Andersson
View author publications
You can also search for this author in PubMed Google Scholar
N. Jesper Larsson
View author publications
You can also search for this author in PubMed Google Scholar
Kurt Swanson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Dan Hirschberg Gene Myers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andersson, A., Larsson, N.J., Swanson, K. (1996). Suffix trees on words. In: Hirschberg, D., Myers, G. (eds) Combinatorial Pattern Matching. CPM 1996. Lecture Notes in Computer Science, vol 1075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61258-0_9

Download citation

DOI: https://doi.org/10.1007/3-540-61258-0_9
Published: 01 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61258-2
Online ISBN: 978-3-540-68390-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics