Approximate Tandem Repeats

Kucherov, Gregory; Sokol, Dina

doi:10.1007/978-1-4939-2864-4_24

Gregory Kucherov² &
Dina Sokol³

111 Accesses

Years and Authors of Summarized Original Work

2001; Landau, Schmidt, Sokol
2003; Kolpakov, Kucherov

Problem Definition

Identification of periodic structures in words (variants of which are known as tandem repeats, repetitions, powers, or runs) is a fundamental algorithmic task (see entry Squares and Repetitions). In many practical applications, such as DNA sequence analysis, considered repetitions admit a certain variation between copies of the repeated pattern. In other words, repetitions under interest are approximate tandem repeats and not necessarily exact repeats only.

The simplest instance of an approximate tandem repeat is an approximate square. An approximate square in a word w is a subword uv, where u and v are within a given distance kaccording to some distance measure between words, such as Hamming distance or edit (also called Levenshtein) distance. There are several ways to define approximate tandem repeats as successions of approximate squares, i.e., to generalize to...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 1,599.99; Price excludes VAT (USA)

Hardcover Book: USD 1,999.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580
Article Google Scholar
Boeva VA, Régnier M, Makeev VJ (2004) SWAN: searching for highly divergent tandem repeats in DNA sequences with the evaluation of their statistical significance. In: Proceedings of JOBIM 2004, Montreal, p 40
Google Scholar
Butler JM (2001) Forensic DNA typing: biology and technology behind STR markers. Academic Press, San Diego
Google Scholar
Crochemore M (1983) Recherche linéaire d’un carré dans un mot. C R Acad Sci Paris Sér I Math 296:781–784
MathSciNet MATH Google Scholar
Delgrange O, Rivals E (2004) STAR – an algorithm to search for tandem approximate repeats. Bioinformatics 20:2812–2820
Article Google Scholar
Gelfand Y, Rodriguez A, Benson G (2007) TRDB – the tandem repeats database. Nucleic Acids Res 35(suppl. 1):D80–D87
Article Google Scholar
Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, Cambridge/New York
Book MATH Google Scholar
Kolpakov R, Kucherov G (1999) Finding maximal repetitions in a word in linear time. In: 40th symposium foundations of computer science (FOCS), New York, pp 596–604. IEEE Computer Society Press
Google Scholar
Kolpakov R, Kucherov G (2003) Finding approximate repetitions under Hamming distance. Theor Comput Sci 33(1):135–156
Article MathSciNet MATH Google Scholar
Kolpakov R, Kucherov G (2005) Identification of periodic structures in words. In: Berstel J, Perrin D (eds) Applied combinatorics on words. Encyclopedia of mathematics and its applications. Lothaire books, vol 104, pp 430–477. Cambridge University Press, Cambridge
Google Scholar
Kolpakov R, Bana G, Kucherov G (2003) mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res 31(13):3672–3678
Article Google Scholar
Landau GM, Vishkin U (1988) Fast string matching with k differences. J Comput Syst Sci 37(1):63–78
Article MathSciNet MATH Google Scholar
Landau GM, Myers EW, Schmidt JP (1998) Incremental string comparison. SIAM J Comput 27(2):557–582
Article MathSciNet MATH Google Scholar
Landau GM, Schmidt JP, Sokol D (2001) An algorithm for approximate tandem repeats. J Comput Biol 8:1–18
Article Google Scholar
Main M (1989) Detecting leftmost maximal periodicities. Discret Appl Math 25:145–153
Article MathSciNet MATH Google Scholar
Main M, Lorentz R (1984) An O(nlog n) algorithm for finding all repetitions in a string. J Algorithms 5(3):422–432
Article MathSciNet MATH Google Scholar
Messer PW, Arndt PF (2007) The majority of recent short DNA insertions in the human genome are tandem duplications. Mol Biol Evol 24(5):1190–1197
Article MathSciNet Google Scholar
Rodeh M, Pratt V, Even S (1981) Linear algorithm for data compression via string matching. J Assoc Comput Mach 28(1):16–24
Article MathSciNet MATH Google Scholar
Sokol D, Benson G, Tojeira J (2006) Tandem repeats over the edit distance. Bioinformatics 23(2):e30–e35
Article Google Scholar
Wexler Y, Yakhini Z, Kashi Y, Geiger D (2005) Finding approximate tandem repeats in genomic sequences. J Comput Biol 12(7):928–942
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Science Foundation Grant DB&I 0542751.

Author information

Authors and Affiliations

CNRS/LIGM, Université Paris-Est, Marne-la-Vallée, France
Gregory Kucherov
Department of Computer and Information Science, Brooklyn College of CUNY, Brooklyn, NY, USA
Dina Sokol

Authors

Gregory Kucherov
View author publications
You can also search for this author in PubMed Google Scholar
Dina Sokol
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gregory Kucherov .

Editor information

Editors and Affiliations

Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA
Ming-Yang Kao

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Kucherov, G., Sokol, D. (2016). Approximate Tandem Repeats. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_24

Download citation

DOI: https://doi.org/10.1007/978-1-4939-2864-4_24
Published: 22 April 2016
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2863-7
Online ISBN: 978-1-4939-2864-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics