Suffix Trees and Arrays

Apostolico, Alberto; Cunial, Fabio

doi:10.1007/978-1-4939-2864-4_627

Alberto Apostolico² &
Fabio Cunial³

76 Accesses

Years and Authors of Summarized Original Work

1973; McCreight
1973; Weiner
1993; Manber, Myers
1995; Ukkonen

The suffix tree is one of the oldest full-text inverted indexes and one of the most persistent subjects of study in the theory of algorithms. With extensions and refinements, including succinct and compressed variants that provide some of its expressive power in smaller space, it constitutes a fundamental conceptual tool in the design of string algorithms. The companion structure represented by the suffix array is as powerful as the suffix tree in many applications, but it requires significantly less space. The uses of these data structures are so numerous that it is difficult to account for all of them, while even more are being discovered. Salient applications include searching for a pattern in a text in time proportional to the size of the pattern, various computations on regularities such as repeats and palindromes within a text, statistical tables of substring occurrences,...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 1,599.99; Price excludes VAT (USA)

Hardcover Book: USD 1,999.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

Abouelhoda MI, Kurtz S, Ohlebusch E (2004) Replacing suffix trees with enhanced suffix arrays. J Discret Algorithms 2(1):53–86
Article MathSciNet MATH Google Scholar
Apostolico A (1985) The myriad virtues of subword trees. In: Apostolico A, Galil Z (eds) Combinatorial algorithms on words. Springer, Berlin/New York, pp 85–96
Chapter Google Scholar
Apostolico A, Bejerano G (2000) Optimal amnesic probabilistic automata or how to learn and classify proteins in linear time and space. J Comput Biol 7(3–4):381–393
Article Google Scholar
Apostolico A, Preparata FP (1983) Optimal off-line detection of repetitions in a string. Theor Comput Sci 22(3):297–315
Article MathSciNet MATH Google Scholar
Apostolico A, Bock ME, Lonardi S, Xu X (2000) Efficient detection of unusual words. J Comput Biol 7(1–2):71–94
Article Google Scholar
Apostolico A, Denas O et al (2008) Fast algorithms for computing sequence distances by exhaustive substring composition. Algorithms Mol Biol 3(13)
Google Scholar
Beller T, Berger K, Ohlebusch E (2012) Space-efficient computation of maximal and supermaximal repeats in genome sequences. In: 19th international symposium on string processing and information retrieval (SPIRE 2012), Cartagena de Indias. Lecture notes in computer science, vol 7608. Springer, pp 99–110
Google Scholar
Chi L, Hui K (1992) Color set size problem with applications to string matching. In: Combinatorial pattern matching, Tucson. Springer, pp 230–243
Chapter Google Scholar
Crochemore M, Hancart C, Lecroq T (2007) Algorithms on strings. Cambridge University Press, New York
Book MATH Google Scholar
Farach M (1997) Optimal suffix tree construction with large alphabets. In: Proceedings of the 38th annual symposium on foundations of computer science, 1997, Miami Beach. IEEE, pp 137–143
Chapter Google Scholar
Farach M, Noordewier M, Savari S, Shepp L, Wyner A, Ziv J (1995) On the entropy of DNA: algorithms and measurements based on memory and rapid convergence. In: Proceedings of the sixth annual ACM-SIAM symposium on discrete algorithms (SODA ’95), San Francisco. Society for Industrial and Applied Mathematics, pp 48–57
Google Scholar
Ferragina P (1997) Dynamic text indexing under string updates. J Algorithms 22(2):296–328
Article MathSciNet MATH Google Scholar
Fiala ER, Greene DH (1989) Data compression with finite windows. Commun ACM 32(4):490–505. doi:10.1145/63334.63341, http://doi.acm.org/10.1145/63334.63341
Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge/New York
Book MATH Google Scholar
Gusfield D, Stoye J (2004) Linear time algorithms for finding and representing all the tandem repeats in a string. J Comput Syst Sci 69(4):525–546. doi:10.1016/j.jcss.2004.03.004, http://dx.doi.org/10.1016/j.jcss.2004.03.004
Herold J, Kurtz S, Giegerich R (2008) Efficient computation of absent words in genomic sequences. BMC Bioinform 9(1):167
Article Google Scholar
Kärkkäinen J, Sanders P, Burkhardt S (2006) Linear work suffix array construction. J ACM 53(6):918–936
Article MathSciNet MATH Google Scholar
Kasai T, Lee G, Arimura H, Arikawa S, Park K (2001) Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Combinatorial pattern matching, Jerusalem. Springer, pp 181–192
Chapter Google Scholar
Kim DK, Sim JS, Park H, Park K (2005) Constructing suffix arrays in linear time. J Discret Algorithms 3(2):126–142
Article MathSciNet MATH Google Scholar
Ko P, Aluru S (2003) Space efficient linear time construction of suffix arrays. In: Combinatorial pattern matching, Morelia. Springer, pp 200–210
Chapter Google Scholar
Kurtz S (1999) Reducing the space requirement of suffix trees. Softw Pract Exp 29:1149–1171
Article Google Scholar
Larsson NJ (1996) Extended application of suffix trees to data compression. In: Data compression conference, Snowbird, pp 190–199
Google Scholar
Lempel A, Ziv J (1976) On the complexity of finite sequences. IEEE Trans Inf Theory 22:75–81
Article MathSciNet MATH Google Scholar
Manber U, Myers G (1993) Suffix arrays: a new method for on-line string searches. SIAM J Comput 22(5):935–948
Article MathSciNet MATH Google Scholar
McCreight EM (1976) A space-economical suffix tree construction algorithm. J ACM 23(2):262– 272
Article MathSciNet MATH Google Scholar
Muthukrishnan S (2002) Efficient algorithms for document retrieval problems. In: Proceedings of the thirteenth annual ACM-SIAM symposium on discrete algorithms (SODA ’02), San Francisco. Society for Industrial and Applied Mathematics, Philadelphia, pp 657–666. http://dl.acm.org/citation.cfm?id=545381.545469
Ohlebusch E, Gog S, Kügel A (2010) Computing matching statistics and maximal exact matches on compressed full-text indexes. In: XXth international symposium on string processing and information retrieval (SPIRE 2010), Los Cabos, pp 347–358
Google Scholar
Puglisi SJ, Smyth WF, Turpin AH (2007) A taxonomy of suffix array construction algorithms. ACM Comput Surv 39(2):4
Article Google Scholar
Rodeh M, Pratt VR, Even S (1981) Linear algorithm for data compression via string matching. J ACM 28(1):16–24
Article MathSciNet MATH Google Scholar
Smola AJ, Vishwanathan S (2003) Fast kernels for string and tree matching. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems (NIPS ’03) 15, Vancouver. MIT, pp 585–592
Google Scholar
Stoye J, Gusfield D (2002) Simple and flexible detection of contiguous repeats using a suffix tree. Theor Comput Sci 270(1):843–856
Article MathSciNet MATH Google Scholar
Ukkonen E (1995) On-line construction of suffix trees. Algorithmica 14(3):249–260
Article MathSciNet MATH Google Scholar
Weiner P (1973) Linear pattern matching algorithms. In: IEEE conference record of 14th annual symposium on switching and automata theory (SWAT ’08), Iowa City, 1973. IEEE, pp 1–11
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
Alberto Apostolico
Department of Computer Science, Helsinki Institute for Information Technology (HIIT), University of Helsinki, Helsinki, Finland
Fabio Cunial

Authors

Alberto Apostolico
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Cunial
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Apostolico .

Editor information

Editors and Affiliations

Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA
Ming-Yang Kao

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Apostolico, A., Cunial, F. (2016). Suffix Trees and Arrays. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_627

Download citation

DOI: https://doi.org/10.1007/978-1-4939-2864-4_627
Published: 22 April 2016
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2863-7
Online ISBN: 978-1-4939-2864-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics