Skip to main content

Suffix Trees

  • Reference work entry

Synonyms

Position tree

Definition

The suffix tree is a data structure that stores all the suffixes of a given string in a compact tree-based structure. Its design allows for a particularly fast implementation of many important string operations.

Discussion

Introduction

The suffix tree is a fundamental data structure in string processing. It exposes the internal structure of a string in a way that facilitates the efficient implementation of a myriad of string operations. Examples of these operations include string matching (both exact and approximate), exact set matching, all-pairs suffix-prefix matching, finding repetitive structures, and finding the longest common substring across multiple strings [12].

Let A denote a set of characters. Let \(S = {s}_{0},{s}_{1},\ldots,{s}_{n-1},\), where \({s}_{i} \in A\) and \(\notin A\), denote a $ terminated input string of length n+ 1. The ith suffix of S is the substring \({s}_{i},{s}_{i+1},\ldots,{s}_{n-1},\). The suffix tree for S, denoted as T...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   1,600.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   1,799.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Bibliography

  1. Apostolico A, Iliopoulos C, Landau G, Schieber B, Vishkin U (1988) Parallel construction of a suffix tree with applications. Algorithmica 3(1–4):347–365

    MATH  MathSciNet  Google Scholar 

  2. Bray N, Dubchak I, Pachter L (2003) AVID: a global alignment program. Genome research 13(1):97–102

    Google Scholar 

  3. Burrows M, Wheeler D (1994) A block sorting lossless data compression algorithm. Technical report, Digital Equipment Corporation. Palo Alto, California

    Google Scholar 

  4. Crauser A, Ferragina P (2008) A theoretical and experimental study on the construction of suffix arrays in external memory. Algorithmica 32(1):1–35

    MathSciNet  Google Scholar 

  5. Delcher A, Kasif S, Fleischmann R, Peterson J, White O, Salzberg S (1999) Alignment of whole genomes. Nucleic Acids Res 27(11):2369–2376

    Google Scholar 

  6. Delcher A, Phillippy A, Carlton J, Salzberg S (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30(1)

    Google Scholar 

  7. Dementiev R, Kärkkäinen J, Mehnert J, Sanders P (2008) Better external memory suffix array construction. J Exp Algorithmics (JEA) 12:3–4

    Google Scholar 

  8. Farach-Colton M, Ferragina P, Muthukrishnan S (2000) On the sorting-complexity of suffix tree construction. J ACM 47(6): 987–1011

    MATH  MathSciNet  Google Scholar 

  9. Futamura N, Aluru S, Kurtz S (2001) Parallel suffix sorting. In: Proceedings 9th international conference on advanced computing and communications. Citeseer, pp 76–81

    Google Scholar 

  10. Ghoting A, Makarychev K (2009) Indexing genomic sequences on the IBM Blue Gene. In: SC ’09: proceedings of the conference on high performance computing networking, storage and analysis. ACM, New York, pp 1–11

    Google Scholar 

  11. Ghoting A, Makarychev K (2009) Serial and parallel methods for I/O efficient suffix tree construction. In: Proceedings of the ACM international conference on management of data. ACM, New York

    Google Scholar 

  12. Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  13. Hariharan R (1994) Optimal parallel suffix tree construction. In: Proceedings of the symposium on theory of computing. ACM, New York

    Google Scholar 

  14. Hunt E, Atkinson M, Irving R (2001) A database index to large biological sequences. In: Proceedings of 27th international conference on very large databases. Morgan Kaufmann, San Francisco

    Google Scholar 

  15. Japp R (2004) The top-compressed suffix tree: a disk resident index for large sequences. In: Proceedings of the bioinformatics workshop at the 21st annual british national conference on databases

    Google Scholar 

  16. Kalyanaraman A, Emrich S, Schnable P, Aluru S (2007) Assembling genomes on largescale parallel computers. J Parallel Distr Comput 67(12):1240–1255

    Google Scholar 

  17. Kärkkäinen J, Sanders P, Burkhardt S (2006) Linear work suffix array construction. J ACM 53(6):918–936

    MathSciNet  Google Scholar 

  18. Ko P, Aluru S (2005) Space efficient linear time construction of suffix arrays. J Discret Algorithms 3(2–4):143–156

    MATH  MathSciNet  Google Scholar 

  19. Kulla F, Sanders P (2006) Scalable parallel suffix array construction. In: Recent advances in parallel virtual machine and message passing interface: 13th European PVM/MPI User’s Group Meeting, Bonn, Germany, 17–20 September, 2006: proceedings. Springer, New York, p 22

    Google Scholar 

  20. Kurtz S, Choudhuri J, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R (2001) Reputer: the manifold applications of repeat analysis on a genome scale. Nucleic Acids Res 29(22):4633–4642

    Google Scholar 

  21. Kurtz S, Phillippy A, Delcher A, Smoot M, Shumway M, Antonescu C, Salzberg S (2004) Versatile and open software for comparing large genomes. Genome Bio 5:(R12)

    Google Scholar 

  22. Manber U, Myers G (1990) Suffix arrays: a new method for on-line string searches. In: Proceedings of the first annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, pp 319–327

    Google Scholar 

  23. McCreight E (1976) A space-economical suffix tree construction algorithm. J ACM 23(2)

    Google Scholar 

  24. Meek C, Patel J, Kasetty S (2003) Oasis: an online and accurate technique for localalignment searches on biological sequences. In: Proceedings of 29th international conference on very large databases

    Google Scholar 

  25. NCBI. Public collections of DNA and RNA sequence reach 100 gigabases, 2005. http://www.nlm.nih.gov/news/press_releases/dna_rna_100_gig.html.

  26. Phoophakdee B, Zaki M (2007) Genome-scale disk-based suffix tree indexing. In: Proceedings of the ACM international conference on management of data. ACM, New York

    Google Scholar 

  27. Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, Hariharan IK, Fortini ME, Li PW, Apweiler R, Fleischmann W, Cherry JM, Henikoff S, Skupski MP, Misra S, Ashburner M, Birney E, Boguski MS, Brody T, Brokstein P, Celniker SE, Chervitz SA, Coates D, Cravchik A, Gabrielian A, Galle RF, Gelbart WM, George RA, Goldstein LS, Gong F, Guan P, Harris NL, Hay BA, Hoskins RA, Li J, Li Z, Hynes RO, Jones SJ, Kuehl PM, Lemaitre B, Littleton JT, Morrison DK, Mungall C, O’Farrell PH, Pickeral OK, Shue C, Vosshall LB, Zhang J, Zhao Q, Zheng XH, Zhong F, Zhong W, Gibbs R, Venter JC, Adams MD, Lewis S (2000) Comparative genomics of the eukaryotes. Science 287(5461):2204–2215

    Google Scholar 

  28. Sahinalp SC, Vishkin U (1994) Symmetry breaking for suffix tree construction. In: STOC ’94: proceedings of the twenty-sixth annual ACM symposium on Theory of computing ACM, New York, pp 300–309

    Google Scholar 

  29. Tian Y, Tata S, Hankins R, Patel J (2005) Practical methods for constructing suffix trees. VLDB J 14(3):281–299

    Google Scholar 

  30. Tsirogiannis D, Koudas N (2010) Suffix tree construction algorithms on modern hardware. In: EDBT ’10: Proceedings of the 13th international conference on extending database Technology. ACM, New York, pp 263–274

    Google Scholar 

  31. Ukkonen E (1992) Constructing suffix trees on-line in linear time. In: Proceedings of the IFIP 12th work computer congress on algorithms, software, architecture: information processing. North Holland Publishing Co., Amsterdam

    Google Scholar 

  32. Weiner P (1973) Linear pattern matching algorithms. In: Proceedings of 14th annual symposium on switch and automata theory. IEEE Computer Society, Washington, DC

    Google Scholar 

  33. Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of 21st international conference on research and development in information retrieval. ACM, New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Ghoting, A., Makarychev, K. (2011). Suffix Trees. In: Padua, D. (eds) Encyclopedia of Parallel Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09766-4_464

Download citation

Publish with us

Policies and ethics