Skip to main content

A Study of Compression–Based Methods for the Analysis of Barcode Sequences

  • Conference paper
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2012)

Abstract

In this paper it is introduced a new methodology for the analysis of barcode sequences. Barcode DNA is a very short nucleotide sequence, corresponding for the animal kingdom to the mitochondrial gene cytochrome c oxidase subunit 1, that acts as a unique element for identification and taxonomic purposes. Traditional barcode analysis uses well consolidated bioinformatics techniques such as sequence alignment, computation of evolutionary distances and phylogenetic trees. The proposed alignment-free approach consists in the use of two different compression-based approximations of Universal Similarity Metric in order to compute dissimilarity matrices among barcode sequences of 20 datasets belonging to different species. From these matrices phylogenetic trees are computed and compared, in terms of topology and branch length, with trees built from evolutionary distance. The results show high similarity values between compression-based and evolutionary-based trees allowing us to consider the former methodology worth to be employed for the study of barcode sequences

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Savolainen, V., Cowan, R.S., Vogler, A.P., Roderick, G.K., Lane, R.: Towards writing the encyclopaedia of life: an introduction to DNA barcoding. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360, 1805–1811 (2005)

    Article  Google Scholar 

  2. Hebert, P.D.N., Cywinska, A., Ball, S.L., de Waard, J.R.: Biological identifications through DNA barcodes. Proc. Biol. Sci. 270, 313–321 (2003)

    Article  Google Scholar 

  3. Hebert, P.D.N., Ratnasingham, S., de Waard, J.R.: Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc. Biol. Sci. 270(suppl. 1), 96–99 (2003)

    Article  Google Scholar 

  4. Costa, F.O., Carvahlo, G.R.: The Barcode of Life Initiative: synopsis and prospective societal impacts of DNA barcoding of fish. Genomics, Society and Policy 3, 29–40 (2007)

    Article  Google Scholar 

  5. Hebert, P.D.N., Stoeckle, M.Y., Zemlak, T.S., Francis, C.M.: Identification of Birds through DNA Barcodes. PLoS Biol. 2(10), e312 (2004)

    Google Scholar 

  6. Smith, M.A., Fisher, B.L., Hebert, P.D.N.: DNA barcoding for effective biodiversity assessment of a hyperdiverse arthropod group: the ants of Madagascar. Phil. Trans. R. Soc. B 360, 1825–1834 (2005)

    Article  Google Scholar 

  7. Hajibabaei, M., Janzen, D.H., Burns, J.M., Hallwachs, W., Hebert, P.D.N.: DNA barcodes distinguish species of tropical Lepidoptera. PNAS 103(4), 968–971 (2006)

    Article  Google Scholar 

  8. Ratnasingham, S., Hebert, P.D.N.: BOLD: The Barcode of Life Data System. Molecular Ecology Notes 7, 355–364 (2007)

    Article  Google Scholar 

  9. Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.M.B.: The Similarity Metric. IEEE T. Inform. Theory 50(12), 3250–3264 (2004)

    Article  MathSciNet  Google Scholar 

  10. Li, M., Vitanyi, P.M.B.: An Introduction to Kolmogorov Complexity and its Applications, 2nd edn. Springer, New York (1997)

    Book  MATH  Google Scholar 

  11. Makarenkov, V., Kevorkov, D., Legendre, P.: Phylogenetic network construction approaches. Applied Mycology and Biotechnology 6, 61–97 (2006)

    Article  Google Scholar 

  12. Cilibrasi, R., Vitanyi, P.M.B.: Clustering by Compression. IEEE T. Inform. Theory 51(4), 1523–1545 (2005)

    Article  MathSciNet  Google Scholar 

  13. Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17(2), 149–154 (2001)

    Article  Google Scholar 

  14. Chen, X., Kwong, S., Li, M.: A compression algorithm for DNA sequences. IEEE Engineering in Medicine and Biology Magazine 20(4), 61–66 (2001)

    Article  Google Scholar 

  15. Ferragina, P., Giancarlo, R., Greco, V., Manzini, G., Valiente, G.: Compression-based classification of biological sequences and structures via the Universal Similarity Metric: Experimental assessment. BMC Bioinformatics 8(252) (2007)

    Google Scholar 

  16. van Rijsbergen, C.J.: Information Retireval. Butterworths, London (1979)

    Google Scholar 

  17. Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Mathematical Biosciences 53(1), 131–147 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  18. La Rosa, M., Rizzo, R., Urso, A., Gaglio, S.: Comparison of Genomic Sequences Clustering Using Normalized Compression Distance and Evolutionary Distance. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part III. LNCS (LNAI), vol. 5179, pp. 740–746. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  19. La Rosa, M., Gaglio, S., Rizzo, R., Urso, A.: Normalised compression distance and evolutionary distance of genomic sequences: comparison of clustering results. Int. J. Knowledge Engineering and Soft Data Paradigms 1(4), 345–362 (2009)

    Article  Google Scholar 

  20. Grumbach, S., Tahi, F.: A new challenge for compression algorithms: genetic sequences. J. Information Processing and Management 30(6), 866–875 (1994)

    Google Scholar 

  21. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  22. Nei, M., Kumar, S.: Molecular Evolution and Phylogenetics. Oxford University Press, New York (2000)

    Google Scholar 

  23. Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy: The Principles and Practice of Numerical Classification. W.H. Freeman, San Francisco (1973)

    MATH  Google Scholar 

  24. Saitou, N., Nei, M.: The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees. Mol. Biol. Evol. 4(4), 406–425 (1987)

    Google Scholar 

  25. Kimura, M.: Estimation of evolutionary distances between homologous nucleotide sequences. Proc. Natl. Acad. Sci. 78, 454–458 (1981)

    Article  MATH  Google Scholar 

  26. Tajima, F., Nei, M.: Estimation of evolutionary distance between nucleotide sequences. Molecular Biology and Evolution 1, 269–285 (1984)

    Google Scholar 

  27. Tamura, K., Nei, M.: Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution 10, 512–526 (1993)

    Google Scholar 

  28. Atallah, M.J., Blanton, M.: Algorithms and Theory of Computation Handbook. CRC Press LLC (1999)

    Google Scholar 

  29. Nye, T.M.W., Liò, P., Gilks, W.R.: A novel algorithm and web-based tool for comparing two alternative phylogenetic trees. Bioinformatics 22(1), 117–119 (2006)

    Article  Google Scholar 

  30. Soria-Carrasco, V., Talavera, G., Igea, J., Castresana, J.: The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees. Bioinformatics 23(21), 2954–2956 (2007)

    Article  Google Scholar 

  31. Kuhner, M.K., Felsenstein, J.: A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11, 459–468 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

La Rosa, M., Fiannaca, A., Rizzo, R., Urso, A. (2013). A Study of Compression–Based Methods for the Analysis of Barcode Sequences. In: Peterson, L.E., Masulli, F., Russo, G. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2012. Lecture Notes in Computer Science(), vol 7845. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38342-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38342-7_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38341-0

  • Online ISBN: 978-3-642-38342-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics