Suffix Trees

Ghoting, Amol; Makarychev, Konstantin

doi:10.1007/978-0-387-09766-4_464

Suffix Trees

Amol Ghoting² &
Konstantin Makarychev²

Reference work entry

165 Accesses
1 Citations

Synonyms

Position tree

Definition

The suffix tree is a data structure that stores all the suffixes of a given string in a compact tree-based structure. Its design allows for a particularly fast implementation of many important string operations.

Discussion

Introduction

The suffix tree is a fundamental data structure in string processing. It exposes the internal structure of a string in a way that facilitates the efficient implementation of a myriad of string operations. Examples of these operations include string matching (both exact and approximate), exact set matching, all-pairs suffix-prefix matching, finding repetitive structures, and finding the longest common substring across multiple strings [12].

Let A denote a set of characters. Let $S = {s}_{0},{s}_{1},\ldots,{s}_{n-1},$, where ${s}_{i} \in A$ and $\notin A$, denote a $ terminated input string of length n+ 1. The ith suffix of S is the substring ${s}_{i},{s}_{i+1},\ldots,{s}_{n-1},$. The suffix tree for S, denoted as T...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 1,600.00; Price excludes VAT (USA)

Hardcover Book: USD 1,799.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Bibliography

Apostolico A, Iliopoulos C, Landau G, Schieber B, Vishkin U (1988) Parallel construction of a suffix tree with applications. Algorithmica 3(1–4):347–365
MATH MathSciNet Google Scholar
Bray N, Dubchak I, Pachter L (2003) AVID: a global alignment program. Genome research 13(1):97–102
Google Scholar
Burrows M, Wheeler D (1994) A block sorting lossless data compression algorithm. Technical report, Digital Equipment Corporation. Palo Alto, California
Google Scholar
Crauser A, Ferragina P (2008) A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica 32(1):1–35
MathSciNet Google Scholar
Delcher A, Kasif S, Fleischmann R, Peterson J, White O, Salzberg S (1999) Alignment of whole genomes. Nucleic Acids Res 27(11):2369–2376
Google Scholar
Delcher A, Phillippy A, Carlton J, Salzberg S (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30(1)
Google Scholar
Dementiev R, Kärkkäinen J, Mehnert J, Sanders P (2008) Better external memory suffix array construction. J Exp Algorithmics (JEA) 12:3–4
Google Scholar
Farach-Colton M, Ferragina P, Muthukrishnan S (2000) On the sorting-complexity of suffix tree construction. J ACM 47(6): 987–1011
MATH MathSciNet Google Scholar
Futamura N, Aluru S, Kurtz S (2001) Parallel suffix sorting. In: Proceedings 9th international conference on advanced computing and communications. Citeseer, pp 76–81
Google Scholar
Ghoting A, Makarychev K (2009) Indexing genomic sequences on the IBM Blue Gene. In: SC ’09: proceedings of the conference on high performance computing networking, storage and analysis. ACM, New York, pp 1–11
Google Scholar
Ghoting A, Makarychev K (2009) Serial and parallel methods for I/O efficient suffix tree construction. In: Proceedings of the ACM international conference on management of data. ACM, New York
Google Scholar
Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge
MATH Google Scholar
Hariharan R (1994) Optimal parallel suffix tree construction. In: Proceedings of the symposium on theory of computing. ACM, New York
Google Scholar
Hunt E, Atkinson M, Irving R (2001) A database index to large biological sequences. In: Proceedings of 27th international conference on very large databases. Morgan Kaufmann, San Francisco
Google Scholar
Japp R (2004) The top-compressed suffix tree: a disk resident index for large sequences. In: Proceedings of the bioinformatics workshop at the 21st annual british national conference on databases
Google Scholar
Kalyanaraman A, Emrich S, Schnable P, Aluru S (2007) Assembling genomes on largescale parallel computers. J Parallel Distr Comput 67(12):1240–1255
Google Scholar
Kärkkäinen J, Sanders P, Burkhardt S (2006) Linear work suffix array construction. J ACM 53(6):918–936
MathSciNet Google Scholar
Ko P, Aluru S (2005) Space efficient linear time construction of suffix arrays. J Discret Algorithms 3(2–4):143–156
MATH MathSciNet Google Scholar
Kulla F, Sanders P (2006) Scalable parallel suffix array construction. In: Recent advances in parallel virtual machine and message passing interface: 13th European PVM/MPI User’s Group Meeting, Bonn, Germany, 17–20 September, 2006: proceedings. Springer, New York, p 22
Google Scholar
Kurtz S, Choudhuri J, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R (2001) Reputer: the manifold applications of repeat analysis on a genome scale. Nucleic Acids Res 29(22):4633–4642
Google Scholar
Kurtz S, Phillippy A, Delcher A, Smoot M, Shumway M, Antonescu C, Salzberg S (2004) Versatile and open software for comparing large genomes. Genome Bio 5:(R12)
Google Scholar
Manber U, Myers G (1990) Suffix arrays: a new method for on-line string searches. In: Proceedings of the first annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, pp 319–327
Google Scholar
McCreight E (1976) A space-economical suffix tree construction algorithm. J ACM 23(2)
Google Scholar
Meek C, Patel J, Kasetty S (2003) Oasis: an online and accurate technique for localalignment searches on biological sequences. In: Proceedings of 29th international conference on very large databases
Google Scholar
NCBI. Public collections of DNA and RNA sequence reach 100 gigabases, 2005. http://www.nlm.nih.gov/news/press_releases/dna_rna_100_gig.html.
Phoophakdee B, Zaki M (2007) Genome-scale disk-based suffix tree indexing. In: Proceedings of the ACM international conference on management of data. ACM, New York
Google Scholar
Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, Hariharan IK, Fortini ME, Li PW, Apweiler R, Fleischmann W, Cherry JM, Henikoff S, Skupski MP, Misra S, Ashburner M, Birney E, Boguski MS, Brody T, Brokstein P, Celniker SE, Chervitz SA, Coates D, Cravchik A, Gabrielian A, Galle RF, Gelbart WM, George RA, Goldstein LS, Gong F, Guan P, Harris NL, Hay BA, Hoskins RA, Li J, Li Z, Hynes RO, Jones SJ, Kuehl PM, Lemaitre B, Littleton JT, Morrison DK, Mungall C, O’Farrell PH, Pickeral OK, Shue C, Vosshall LB, Zhang J, Zhao Q, Zheng XH, Zhong F, Zhong W, Gibbs R, Venter JC, Adams MD, Lewis S (2000) Comparative genomics of the eukaryotes. Science 287(5461):2204–2215
Google Scholar
Sahinalp SC, Vishkin U (1994) Symmetry breaking for suffix tree construction. In: STOC ’94: proceedings of the twenty-sixth annual ACM symposium on Theory of computing ACM, New York, pp 300–309
Google Scholar
Tian Y, Tata S, Hankins R, Patel J (2005) Practical methods for constructing suffix trees. VLDB J 14(3):281–299
Google Scholar
Tsirogiannis D, Koudas N (2010) Suffix tree construction algorithms on modern hardware. In: EDBT ’10: Proceedings of the 13th international conference on extending database Technology. ACM, New York, pp 263–274
Google Scholar
Ukkonen E (1992) Constructing suffix trees on-line in linear time. In: Proceedings of the IFIP 12th work computer congress on algorithms, software, architecture: information processing. North Holland Publishing Co., Amsterdam
Google Scholar
Weiner P (1973) Linear pattern matching algorithms. In: Proceedings of 14th annual symposium on switch and automata theory. IEEE Computer Society, Washington, DC
Google Scholar
Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of 21st international conference on research and development in information retrieval. ACM, New York
Google Scholar

Download references

Author information

Authors and Affiliations

Data Mining Systems Group, IBM Thomas. J. Watson Research Center, Yorktown Heights, NY, USA
Amol Ghoting (Dr.) & Konstantin Makarychev (Dr.)

Authors

Amol Ghoting
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Makarychev
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, IL, USA
David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Ghoting, A., Makarychev, K. (2011). Suffix Trees. In: Padua, D. (eds) Encyclopedia of Parallel Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09766-4_464

Download citation

DOI: https://doi.org/10.1007/978-0-387-09766-4_464
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09765-7
Online ISBN: 978-0-387-09766-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics