Skip to main content

On the Role of Inverted Repeats in DNA Sequence Similarity

  • Conference paper
  • First Online:
11th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2017)

Abstract

In this paper, we propose a computational approach to quantify inverted repeats. This is important, because it is known that the presence of inverted repeats in genomic data may be associated to certain chromosomal rearrangements. First, we present a reference-based relative compression method, which employs statistical characteristics of the genomic data. Then, for determining the similarity between genomic sequences, we use the normalized relative compression measure, which is light-weight regarding computational time and memory. Testing this approach on various species, including human, chimpanzee, gorilla, chicken, turkey and archaea genomes, we unveil unreported results that may support several evolution insights.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kahn, S.: On the future of genomic data. Science 331, 728–729 (2011)

    Article  Google Scholar 

  2. Alberti, C., et al.: Investigation on genomic information compression and storage. ISO/IEC JTC 1/SC 29/WG 11 N15346, pp. 1–28 (2015)

    Google Scholar 

  3. Giancarlo, R., et al.: Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies. Briefings Bioinform. 15, 390–406 (2014)

    Article  Google Scholar 

  4. Hosseini, M., et al.: A survey on data compression methods for biological sequences. Information 7, 56 (2016)

    Article  Google Scholar 

  5. Lesk, A.: Introduction to Bioinformatics. Oxford University Press, Oxford (2013)

    MATH  Google Scholar 

  6. Pinho, A.J., et al.: Inverted-repeats-aware finite-context models for DNA coding. In: 2008 16th European Signal Processing Conference, pp. 1–5 (2008)

    Google Scholar 

  7. Lee, J., et al.: Chromosomal inversions between human and chimpanzee lineages caused by retrotransposons. PLoS One 3(12), e4047 (2008)

    Article  Google Scholar 

  8. Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. Pratas, D., Pinho, A.J.: A conditional compression distance that unveils insights of the genomic evolution. In: Data Compression Conference, p. 421 (2014)

    Google Scholar 

  10. Nikvand, N., Wang, Z.: Generic image similarity based on Kolmogorov complexity. In: IEEE International Conference on Image Processing, pp. 309–312 (2010)

    Google Scholar 

  11. Pinho, A.J., et al.: Authorship attribution using relative compression. In: Data Compression Conference, pp. 329–338 (2016)

    Google Scholar 

  12. Kolmogorov, A.: Three approaches to the quantitative definition of information. Probl. Inf. Transm. 1(1), 1–7 (1965)

    MathSciNet  MATH  Google Scholar 

  13. Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer, New York (2009)

    MATH  Google Scholar 

  14. Sayood, K.: Introduction to Data Compression, 4th edn. Morgan Kaufmann, Waltham (2012)

    MATH  Google Scholar 

  15. Pinho, A.J., et al.: Information profiles for DNA pattern discovery. In: Data Compression Conference, p. 420 (2014)

    Google Scholar 

  16. Pratas, D., et al.: An alignment-free method to find and visualise rearrangements between pairs of DNA sequences. Sci. Rep. 5, 10203 (2015)

    Article  Google Scholar 

  17. Pinho, A.J., et al.: On the representability of complete genomes by multiple competing finite-context (Markov) models. PloS One 6, e21588 (2011)

    Article  Google Scholar 

  18. Hosseini, M.: 21 March 2017. github.com/smortezah/Phoenix

  19. Pratas, D.: 21 March 2017. github.com/pratas/goose

  20. Ijdo, J., et al.: Origin of human chromosome 2: an ancestral telomere-telomere fusion. PNAS 88, 9051–9055 (1991)

    Article  Google Scholar 

  21. Hughes, J., et al.: Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature 463(7280), 536–539 (2010)

    Article  Google Scholar 

  22. Kehrer-Sawatzki, H., et al.: Breakpoint analysis of the pericentric inversion distinguishing human chromosome 4 from the homologous chromosome in the chimpanzee (Pan troglodytes). Hum. Mutat. 25(1), 45–55 (2005)

    Article  Google Scholar 

  23. Mikkelsen, T.S.: Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005)

    Article  Google Scholar 

  24. Bachtrog, D.: Y-chromosome evolution: emerging insights into processes of Y-chro-mosome degeneration. Nat. Rev. Genet. 14(2), 113–124 (2013)

    Article  Google Scholar 

  25. Samonte, R.V., Eichler, E.E.: Segmental duplications and the evolution of the primate genome. Nat. Rev. Genet. 3(1), 65–72 (2002)

    Article  Google Scholar 

  26. Dalloul, R.A., et al.: Multi-platform next-generation sequencing of the domestic Turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol. 8(9), e1000475 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank the FCT—Foundation for Science and Technology in Portugal, for their support of this research, within the Doctoral Programme FCT MAP-i in Computer Science, and also acknowledge european funds through FEDER, under the COMPETE 2020 and Portugal 2020 programs, in the context of the projects UID/CEC/00127/2013 and PTDC/EEI-SII/6608/2014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Morteza Hosseini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Hosseini, M., Pratas, D., Pinho, A.J. (2017). On the Role of Inverted Repeats in DNA Sequence Similarity. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., Pinto, T. (eds) 11th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2017. Advances in Intelligent Systems and Computing, vol 616. Springer, Cham. https://doi.org/10.1007/978-3-319-60816-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60816-7_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60815-0

  • Online ISBN: 978-3-319-60816-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics