Skip to main content

Advertisement

Log in

Computational modelling of interruptional activities between transposable elements using grammars and the linear ordering problem

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Transposable elements (TEs) are DNA sequences that can either move or copy themselves to new positions within a genome. They constitute approximately 45 % of the human genome. Knowing the evolution of TEs is helpful in understanding the activities of these elements and their impacts on genomes. In this paper, we devise a formal model providing notations/definitions that are compatible with biological nomenclature, while still providing a suitable formal foundation for computational analysis. We define sequential interruptions between TEs that occur in a genomic sequence to estimate how often TEs interrupt other TEs, useful in predicting their ages. We also describe the problem in terms of a matrix problem—the linear ordering problem. We then define the recursive interruption context-free grammar to capture the recursive nature in which TEs nest themselves into other TEs, and associate probabilities to convert the context-free grammar into a stochastic context-free grammar, as well as discuss how to use the CYK algorithm to find a most likely parse tree predicting TE nesting. We also discuss improvements on the theoretical model and adjust the parse trees to capture both sequential and recursive interruptional activities between TEs and obtain more standard evolutionary trees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. hg19, the February 2009 assembly of the human genome.

  2. We calculate the amount that separates them or the amount they overlap using the abs() function to get the absolute value.

References

  • Bastiani Medina SS, Castilla Valdez G (2012) Iterated local search for the linear ordering problem. Int J Comb Optim Probl Inform 3(1):12–20

    Google Scholar 

  • Batzer M, Deininger P, Hellmann-Blumberg U, Jurka J, Labuda D, Rubin C, Schmid C, Zietkiewicz E, Zuckerkandl E (1996) Standardized nomenclature for alu repeats. J Mol Evolut 42(1):3–6

    Article  Google Scholar 

  • Belancio V, Roy-Engel A, Deininger P (2010) All y’all need to know ’bout retroelements in cancer. In: Seminars in Cancer Biology, vol 20. Elsevier, pp 200–210

  • Bergman CM, Quesneville H (2007) Discovering and detecting transposable elements in genome sequences. Brief Bioinform 8(6):382–392

  • Caspi A, Pachter L (2006) Identification of transposable elements using multiple alignments of related genomes. Genome Res 16(2):260–270

    Article  Google Scholar 

  • Charon I, Hudry O (2006) A branch-and-bound algorithm to solve the linear ordering problem for weighted tournaments. Discrete Appl Mathematics 154(15):2097–2116

    Article  MathSciNet  MATH  Google Scholar 

  • Decani JS (1972) A branch and bound algorithm for maximum likelihood paired comparison ranking. Biometrika 59(1):131–135

    Article  MathSciNet  Google Scholar 

  • de Koning AJ, Gu W, Castoe TA, Batzer MA, Pollock DD (2011) Repetitive elements may comprise over two-thirds of the human genome. PLoS Genetics 7(12):e1002,384

    Article  Google Scholar 

  • Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Feo TA, Resende MG (1995) Greedy randomized adaptive search procedures. J Global Optim 6(2):109–133

    Article  MathSciNet  MATH  Google Scholar 

  • Finnegan DJ (1989) Eukaryotic transposable elements and genome evolution. Trends Genetics 5:103–107

    Article  Google Scholar 

  • Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP completeness, vol 174. Freeman, San Francisco

    MATH  Google Scholar 

  • Giordano J, Ge Y, Gelfand Y, Abrusán G, Benson G, Warburton P (2007) Evolutionary history of mammalian transposons determined by genome-wide defragmentation. PLoS Comput Biol 3(7):e137

    Article  Google Scholar 

  • Glover F, Laguna M (2013) Tabu search. Springer, New York

    Google Scholar 

  • Gregory T (2005) The evolution of the genome. Academic Press, New York

    Google Scholar 

  • Hansen P, Mladenović N (2003) Variable neighborhood search. Springer, New York

    Book  Google Scholar 

  • Hopcroft JE (2008) Introduction to automata theory, languages, and computation. Pearson Education India, India

    Google Scholar 

  • Jones NC, Pevzner P (2004) An introduction to bioinformatics algorithms. MIT press, Cambridge

    Google Scholar 

  • Jurka J, Kapitonov V, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J (2005) Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110(1–4):462–467

    Article  Google Scholar 

  • Kapitonov V, Jurkal J (1996) The age of alu subfamilies. J Mol Evolut 42(1):59–65

    Article  Google Scholar 

  • Kirkpatrick S, Vecchi M et al (1983) Optimization by simmulated annealing. Science 220(4598):671–680

    Article  MathSciNet  MATH  Google Scholar 

  • Korte B, Oberhofer W (1968) Zwei algorithmen zur lösung eines komplexen reihenfolgeproblems. Unternehmensforschung Oper Res Recherche Opérationnelle 12:217–231

    MATH  Google Scholar 

  • Korte B, Oberhofer W (1970) Triangularizing input-output matrices and the structure of production. Eur Econ Rev 1(4):482–511

    Article  Google Scholar 

  • Kronmiller BA, Wise RP (2008) TEnest: automated chronological annotation and visualization of nested plant transposable elements. Plant Physiol 146(1):45–59

    Article  Google Scholar 

  • Laguna M, Marti R, Martí RC (2003) Scatter search: methodology and implementations in C, vol 24. Springer, New York

    Google Scholar 

  • Lander E, Linton L, Birren B, Nusbaum C, Zody M, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921

    Article  Google Scholar 

  • Lerat E (2009) Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity 104(6):520–533

    Article  Google Scholar 

  • Loureiro T, Camacho R, Vieira J, Fonseca NA (2013) Improving the performance of transposable elements detection tools. J Integr Bioinform 10(3):231

    Google Scholar 

  • Martí R, Reinelt G (2011) The linear ordering problem: exact and heuristic methods in combinatorial optimization, vol 175. Springer, New York

    Google Scholar 

  • Martí R, Reinelt G, Duarte A (2012) A benchmark library and a comparison of heuristic methods for the linear ordering problem. Comput Optim Appl 51(3):1297–1317

    Article  MathSciNet  MATH  Google Scholar 

  • McClintock B (1951) Chromosome organization and genic expression. In: Cold spring harbor symposia on quantitative biology, vol 16. Cold Spring Harbor Laboratory Press, pp 13–47

  • Schiavinotto T, Stützle T (2004) The linear ordering problem: instances, search space analysis and algorithms. J Mathematical Model Algorithms 3(4):367–402

    Article  MATH  Google Scholar 

  • Smit A, Toth G, Riggs A, Jurka J (1995) Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. J Mole Biol 246(3):401–417

    Article  Google Scholar 

  • Smit AFA, Hubley R, Green P (2013–2015) RepeatMasker Open-4.0. http://www.repeatmasker.org

  • Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O et al (2007) A unified classification system for Eukaryotic transposable elements. Nature Rev Genetics 8(12):973–982

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lingling Jin.

Additional information

Communicated by C.M. Vide.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, L., McQuillan, I. Computational modelling of interruptional activities between transposable elements using grammars and the linear ordering problem. Soft Comput 20, 19–35 (2016). https://doi.org/10.1007/s00500-015-1725-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-015-1725-2

Keywords

Navigation