Skip to main content

Comparison of Genomic Sequences Clustering Using Normalized Compression Distance and Evolutionary Distance

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5179))

Abstract

Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a long procedure and the obtained dissimilarity results is not a metric. Recently the normalized compression distance was introduced as a method to calculate the distance between two generic digital objects, and it seems a suitable way to compare genomic strings. In this paper the clustering and the mapping, obtained using a SOM, with the traditional evolutionary distance and the compression distance are compared in order to understand if the two distances sets are similar. The first results indicate that the two distances catch different aspects of the genomic sequences and further investigations are needed to obtain a definitive result.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. National Center for Biotechnology Information, Entrez Nucleotide query, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide

  2. European Molecular Biology Laboratory, http://www.ebi.ac.uk/embl/

  3. Nei, M., Kumar, S.: Molecular Evolution and Phylogenetics. Oxford University Press, New York (2000)

    Google Scholar 

  4. Needleman, S.B., Wunsch, C.D.: J. Mol. Biol.  48, 443–453 (1970)

    Google Scholar 

  5. Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)

    Article  Google Scholar 

  6. Li, M., Chen, X., Li, X., Ma, B., Vityi, P.M.B.: The similarity metric. IEEE Trans. Inf. Theory 50(12), 3250–3264 (2004)

    Article  Google Scholar 

  7. Li, M., Vitanyi, P.M.B.: An Introduction to Kolmogorov Complexity and its Applications, 2nd edn. Springer, New York (1997)

    MATH  Google Scholar 

  8. Kohonen, T.: Self-organizing maps. Springer, Heidelberg (1995)

    Google Scholar 

  9. Drancourt, M., Bollet, C., Carlioz, A., Martelin, R., Gayral, J., Raoult, D.: 16S Ribosomal DNA Sequence Analysis of a Large Collection of Environmental and Clinical Unidentifiable Bacterial Isolates. J. Clin. Microbiol. 38, 3623–3630 (2000)

    Google Scholar 

  10. Drancourt, M., Berger, P., Raoult, D.: Systematic 16S RNA Gene Sequencing of Atypical Clinical Isolates Identified 27 New Bacterial Species Associated with Humans. J. Clin. Microbiol. 42, 2197–2202 (2004)

    Article  Google Scholar 

  11. Cilibrasi, R., Vitanyi, P.M.B.: Clustering by Compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)

    Article  MathSciNet  Google Scholar 

  12. Somervuo, P., Kohonen, T.: Clustering and visualization of large protein sequence databases by means of an extension of the self-organizing map. In: Proceedings of the Third International Conference on Discovery Science, pp. 76–85 (2000)

    Google Scholar 

  13. Oja, M., Somervuo, P., Kaski, S., Kohonen, T.: Clustering of human endogenous retrovirus sequences with median self-organizing map. In: WSOM 2003 Workshop on Self-Organizing Maps, September 9-14, 2003 (2003)

    Google Scholar 

  14. Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)

    Article  Google Scholar 

  15. La Rosa, M., Di Fatta, G., Gaglio, S., Giammanco, G.M., Rizzo, R., Urso, A.: Soft Topographic Map for Clustering and Classification of Bacteria. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 332–343. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  16. Graepel, T., Burger, M., Obermayer, K.: Self-organizing maps: generalizations and new optimization techniques. Neurocomputing 21, 173–190 (1998)

    Article  MATH  Google Scholar 

  17. Chen, X., Kwong, S., Li, M.: A compression algorithm for DNA sequences. Engineering in Medicine and Biology Magazine 20(4), 61–66 (2001)

    Article  Google Scholar 

  18. Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Networks 15(8-9), 945–952 (2002)

    Article  Google Scholar 

  19. Hasenfuss, A., Hammer, B.: Relational Topographic Maps. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 93–105. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  20. Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17, 401–419 (1952)

    Article  MATH  MathSciNet  Google Scholar 

  21. Jukes, T.H., Cantor, R.R.: Evolution of protein molecules. In: Munro, H.N. (ed.) Mammalian Protein Metabolism, pp. 21–132. Academic Press, New York (1969)

    Google Scholar 

  22. Kaski, S., Lagus, K.: Comparing Self-Organizing Maps. In: Proceedings of the 1996 International Conference on Artificial Neural Networks (1996)

    Google Scholar 

  23. Ferragina, P., Giancarlo, R., Greco, V., Manzini, G., Valiente, G.: Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment. BMC Bioinformatics 8, 252 (2007)

    Article  Google Scholar 

  24. Garrity, G.M., Lilburn, T.G.: Self-organizing and self-correcting classifications of biological data. Bioinformatics 21(10), 2309–2314 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ignac Lovrek Robert J. Howlett Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

La Rosa, M., Rizzo, R., Urso, A., Gaglio, S. (2008). Comparison of Genomic Sequences Clustering Using Normalized Compression Distance and Evolutionary Distance. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2008. Lecture Notes in Computer Science(), vol 5179. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85567-5_92

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85567-5_92

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85566-8

  • Online ISBN: 978-3-540-85567-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics