Skip to main content

Approximate Tandem Repeats

  • Reference work entry
  • First Online:
Encyclopedia of Algorithms
  • 111 Accesses

Years and Authors of Summarized Original Work

  • 2001; Landau, Schmidt, Sokol

  • 2003; Kolpakov, Kucherov

Problem Definition

Identification of periodic structures in words (variants of which are known as tandem repeats, repetitions, powers, or runs) is a fundamental algorithmic task (see entry Squares and Repetitions). In many practical applications, such as DNA sequence analysis, considered repetitions admit a certain variation between copies of the repeated pattern. In other words, repetitions under interest are approximate tandem repeats and not necessarily exact repeats only.

The simplest instance of an approximate tandem repeat is an approximate square. An approximate square in a word w is a subword uv, where u and v are within a given distance kaccording to some distance measure between words, such as Hamming distance or edit (also called Levenshtein) distance. There are several ways to define approximate tandem repeats as successions of approximate squares, i.e., to generalize to...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 1,599.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 1,999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580

    Article  Google Scholar 

  2. Boeva VA, Régnier M, Makeev VJ (2004) SWAN: searching for highly divergent tandem repeats in DNA sequences with the evaluation of their statistical significance. In: Proceedings of JOBIM 2004, Montreal, p 40

    Google Scholar 

  3. Butler JM (2001) Forensic DNA typing: biology and technology behind STR markers. Academic Press, San Diego

    Google Scholar 

  4. Crochemore M (1983) Recherche linéaire d’un carré dans un mot. C R Acad Sci Paris Sér I Math 296:781–784

    MathSciNet  MATH  Google Scholar 

  5. Delgrange O, Rivals E (2004) STAR – an algorithm to search for tandem approximate repeats. Bioinformatics 20:2812–2820

    Article  Google Scholar 

  6. Gelfand Y, Rodriguez A, Benson G (2007) TRDB – the tandem repeats database. Nucleic Acids Res 35(suppl. 1):D80–D87

    Article  Google Scholar 

  7. Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, Cambridge/New York

    Book  MATH  Google Scholar 

  8. Kolpakov R, Kucherov G (1999) Finding maximal repetitions in a word in linear time. In: 40th symposium foundations of computer science (FOCS), New York, pp 596–604. IEEE Computer Society Press

    Google Scholar 

  9. Kolpakov R, Kucherov G (2003) Finding approximate repetitions under Hamming distance. Theor Comput Sci 33(1):135–156

    Article  MathSciNet  MATH  Google Scholar 

  10. Kolpakov R, Kucherov G (2005) Identification of periodic structures in words. In: Berstel J, Perrin D (eds) Applied combinatorics on words. Encyclopedia of mathematics and its applications. Lothaire books, vol 104, pp 430–477. Cambridge University Press, Cambridge

    Google Scholar 

  11. Kolpakov R, Bana G, Kucherov G (2003) mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res 31(13):3672–3678

    Article  Google Scholar 

  12. Landau GM, Vishkin U (1988) Fast string matching with k differences. J Comput Syst Sci 37(1):63–78

    Article  MathSciNet  MATH  Google Scholar 

  13. Landau GM, Myers EW, Schmidt JP (1998) Incremental string comparison. SIAM J Comput 27(2):557–582

    Article  MathSciNet  MATH  Google Scholar 

  14. Landau GM, Schmidt JP, Sokol D (2001) An algorithm for approximate tandem repeats. J Comput Biol 8:1–18

    Article  Google Scholar 

  15. Main M (1989) Detecting leftmost maximal periodicities. Discret Appl Math 25:145–153

    Article  MathSciNet  MATH  Google Scholar 

  16. Main M, Lorentz R (1984) An O(nlog n) algorithm for finding all repetitions in a string. J Algorithms 5(3):422–432

    Article  MathSciNet  MATH  Google Scholar 

  17. Messer PW, Arndt PF (2007) The majority of recent short DNA insertions in the human genome are tandem duplications. Mol Biol Evol 24(5):1190–1197

    Article  MathSciNet  Google Scholar 

  18. Rodeh M, Pratt V, Even S (1981) Linear algorithm for data compression via string matching. J Assoc Comput Mach 28(1):16–24

    Article  MathSciNet  MATH  Google Scholar 

  19. Sokol D, Benson G, Tojeira J (2006) Tandem repeats over the edit distance. Bioinformatics 23(2):e30–e35

    Article  Google Scholar 

  20. Wexler Y, Yakhini Z, Kashi Y, Geiger D (2005) Finding approximate tandem repeats in genomic sequences. J Comput Biol 12(7):928–942

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Science Foundation Grant DB&I 0542751.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gregory Kucherov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this entry

Cite this entry

Kucherov, G., Sokol, D. (2016). Approximate Tandem Repeats. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_24

Download citation

Publish with us

Policies and ethics