Synonyms
Definition
The suffix tree is a data structure that stores all the suffixes of a given string in a compact tree-based structure. Its design allows for a particularly fast implementation of many important string operations.
Discussion
Introduction
The suffix tree is a fundamental data structure in string processing. It exposes the internal structure of a string in a way that facilitates the efficient implementation of a myriad of string operations. Examples of these operations include string matching (both exact and approximate), exact set matching, all-pairs suffix-prefix matching, finding repetitive structures, and finding the longest common substring across multiple strings [12].
Let A denote a set of characters. Let \(S = {s}_{0},{s}_{1},\ldots,{s}_{n-1},\), where \({s}_{i} \in A\) and \(\notin A\), denote a $ terminated input string of length n+ 1. The ith suffix of S is the substring \({s}_{i},{s}_{i+1},\ldots,{s}_{n-1},\). The suffix tree for S, denoted as T...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsBibliography
Apostolico A, Iliopoulos C, Landau G, Schieber B, Vishkin U (1988) Parallel construction of a suffix tree with applications. Algorithmica 3(1–4):347–365
Bray N, Dubchak I, Pachter L (2003) AVID: a global alignment program. Genome research 13(1):97–102
Burrows M, Wheeler D (1994) A block sorting lossless data compression algorithm. Technical report, Digital Equipment Corporation. Palo Alto, California
Crauser A, Ferragina P (2008) A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica 32(1):1–35
Delcher A, Kasif S, Fleischmann R, Peterson J, White O, Salzberg S (1999) Alignment of whole genomes. Nucleic Acids Res 27(11):2369–2376
Delcher A, Phillippy A, Carlton J, Salzberg S (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30(1)
Dementiev R, Kärkkäinen J, Mehnert J, Sanders P (2008) Better external memory suffix array construction. J Exp Algorithmics (JEA) 12:3–4
Farach-Colton M, Ferragina P, Muthukrishnan S (2000) On the sorting-complexity of suffix tree construction. J ACM 47(6): 987–1011
Futamura N, Aluru S, Kurtz S (2001) Parallel suffix sorting. In: Proceedings 9th international conference on advanced computing and communications. Citeseer, pp 76–81
Ghoting A, Makarychev K (2009) Indexing genomic sequences on the IBM Blue Gene. In: SC ’09: proceedings of the conference on high performance computing networking, storage and analysis. ACM, New York, pp 1–11
Ghoting A, Makarychev K (2009) Serial and parallel methods for I/O efficient suffix tree construction. In: Proceedings of the ACM international conference on management of data. ACM, New York
Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge
Hariharan R (1994) Optimal parallel suffix tree construction. In: Proceedings of the symposium on theory of computing. ACM, New York
Hunt E, Atkinson M, Irving R (2001) A database index to large biological sequences. In: Proceedings of 27th international conference on very large databases. Morgan Kaufmann, San Francisco
Japp R (2004) The top-compressed suffix tree: a disk resident index for large sequences. In: Proceedings of the bioinformatics workshop at the 21st annual british national conference on databases
Kalyanaraman A, Emrich S, Schnable P, Aluru S (2007) Assembling genomes on largescale parallel computers. J Parallel Distr Comput 67(12):1240–1255
Kärkkäinen J, Sanders P, Burkhardt S (2006) Linear work suffix array construction. J ACM 53(6):918–936
Ko P, Aluru S (2005) Space efficient linear time construction of suffix arrays. J Discret Algorithms 3(2–4):143–156
Kulla F, Sanders P (2006) Scalable parallel suffix array construction. In: Recent advances in parallel virtual machine and message passing interface: 13th European PVM/MPI User’s Group Meeting, Bonn, Germany, 17–20 September, 2006: proceedings. Springer, New York, p 22
Kurtz S, Choudhuri J, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R (2001) Reputer: the manifold applications of repeat analysis on a genome scale. Nucleic Acids Res 29(22):4633–4642
Kurtz S, Phillippy A, Delcher A, Smoot M, Shumway M, Antonescu C, Salzberg S (2004) Versatile and open software for comparing large genomes. Genome Bio 5:(R12)
Manber U, Myers G (1990) Suffix arrays: a new method for on-line string searches. In: Proceedings of the first annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, pp 319–327
McCreight E (1976) A space-economical suffix tree construction algorithm. J ACM 23(2)
Meek C, Patel J, Kasetty S (2003) Oasis: an online and accurate technique for localalignment searches on biological sequences. In: Proceedings of 29th international conference on very large databases
NCBI. Public collections of DNA and RNA sequence reach 100 gigabases, 2005. http://www.nlm.nih.gov/news/press_releases/dna_rna_100_gig.html.
Phoophakdee B, Zaki M (2007) Genome-scale disk-based suffix tree indexing. In: Proceedings of the ACM international conference on management of data. ACM, New York
Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, Hariharan IK, Fortini ME, Li PW, Apweiler R, Fleischmann W, Cherry JM, Henikoff S, Skupski MP, Misra S, Ashburner M, Birney E, Boguski MS, Brody T, Brokstein P, Celniker SE, Chervitz SA, Coates D, Cravchik A, Gabrielian A, Galle RF, Gelbart WM, George RA, Goldstein LS, Gong F, Guan P, Harris NL, Hay BA, Hoskins RA, Li J, Li Z, Hynes RO, Jones SJ, Kuehl PM, Lemaitre B, Littleton JT, Morrison DK, Mungall C, O’Farrell PH, Pickeral OK, Shue C, Vosshall LB, Zhang J, Zhao Q, Zheng XH, Zhong F, Zhong W, Gibbs R, Venter JC, Adams MD, Lewis S (2000) Comparative genomics of the eukaryotes. Science 287(5461):2204–2215
Sahinalp SC, Vishkin U (1994) Symmetry breaking for suffix tree construction. In: STOC ’94: proceedings of the twenty-sixth annual ACM symposium on Theory of computing ACM, New York, pp 300–309
Tian Y, Tata S, Hankins R, Patel J (2005) Practical methods for constructing suffix trees. VLDB J 14(3):281–299
Tsirogiannis D, Koudas N (2010) Suffix tree construction algorithms on modern hardware. In: EDBT ’10: Proceedings of the 13th international conference on extending database Technology. ACM, New York, pp 263–274
Ukkonen E (1992) Constructing suffix trees on-line in linear time. In: Proceedings of the IFIP 12th work computer congress on algorithms, software, architecture: information processing. North Holland Publishing Co., Amsterdam
Weiner P (1973) Linear pattern matching algorithms. In: Proceedings of 14th annual symposium on switch and automata theory. IEEE Computer Society, Washington, DC
Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of 21st international conference on research and development in information retrieval. ACM, New York
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Ghoting, A., Makarychev, K. (2011). Suffix Trees. In: Padua, D. (eds) Encyclopedia of Parallel Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09766-4_464
Download citation
DOI: https://doi.org/10.1007/978-0-387-09766-4_464
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09765-7
Online ISBN: 978-0-387-09766-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering