Skip to main content

A Distance Measure for Genome Phylogenetic Analysis

  • Conference paper
AI 2009: Advances in Artificial Intelligence (AI 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5866))

Included in the following conference series:

Abstract

Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Camin, J., Sokal, R.: A method for deducing branching sequences in phylogeny. Evolution, 311–326 (1965)

    Google Scholar 

  2. Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Bio., 368–376 (1981)

    Google Scholar 

  3. Saitou, N., Nei, M.: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol., 406–425 (1987)

    Google Scholar 

  4. Gogarten, P., Townsend, F.: Horizontal gene transfer, genome innovation and evolution. Nature Reviews Microbiology, 679–687 (2005)

    Google Scholar 

  5. Sankoff, D., Leduc, G., Antoine, N., Paquin, B., Lang, B.F., Cedergren, R.: Gene order comparisons for phylogenetic inference: evolution of the mitochondrial genome. PNAS, 6575–6579 (1992)

    Google Scholar 

  6. Snel, B., Bork, P., Huynen, M.A.: Genome phylogeny based on gene content. Nat. Genet., 66–67 (1999)

    Google Scholar 

  7. Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal, 379–423 (1948)

    Google Scholar 

  8. Wallace, C.S., Boulton, D.M.: An information measure for classification. Computer Journal, 185–194 (1968)

    Google Scholar 

  9. Sokal, R., Michener, C.: A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, 1409–1438 (1958)

    Google Scholar 

  10. Lerat, E., Daubin, V., Moran, N.A.: From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-proteobacteria. PLoS Biology, e19 (2003)

    Google Scholar 

  11. Vinga, S., Almeida, J.: Alignment-free sequence comparison - a review. Bioinformatics, 513–523 (2003)

    Google Scholar 

  12. Blaisdell, B.E.: A measure of the similarity of sets of sequences not requiring sequence alignment. PNAS, 5155–5159 (1986)

    Google Scholar 

  13. Gentleman, J., Mullin, R.: The distribution of the frequency of occurrence of nucleotide subsequences, based on their overlap capability. Biometrics, 35–52 (1989)

    Google Scholar 

  14. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 337–342 (1977)

    Google Scholar 

  15. Cleary, J.G., Witten, I.H.: Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, 396–402 (1984)

    Google Scholar 

  16. Grumbach, S., Tahi, F.: A new challenge for compression algorithms: Genetic sequences. Journal of Information Processing and Management, 875–866 (1994)

    Google Scholar 

  17. Chen, X., Kwong, S., Li, M.: A compression algorithm for DNA sequences and its applications in genome comparison. RECOMB, 107 (2000)

    Google Scholar 

  18. Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics, 149–154 (2001)

    Google Scholar 

  19. Otu, H., Sayood, K.: A new sequence distance measure for phylogenetic tree construction. Bioinformatics, 2122–2130 (2003)

    Google Scholar 

  20. Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Transactions on Information Theory, 75–81 (1976)

    Google Scholar 

  21. Cao, M.D., Dix, T.I., Allison, L., Mears, C.: A simple statistical algorithm for biological sequence compression. DCC, 43–52 (2007)

    Google Scholar 

  22. Felsenstein, J.: PHYLIP phylogeny inference package. Technical report (1993)

    Google Scholar 

  23. Waters, A., Higgins, D., McCutchan, T.: Evolutionary relatedness of some primate models of plasmodium. Mol. Biol. Evol., 914–923 (1993)

    Google Scholar 

  24. Escalante, A., Goldman, I.F., Rijk, P.D., Wachter, R.D., Collins, W.E., Qari, S.H., Lal, A.A.: Phylogenetic study of the genus plasmodium based on the secondary structure-based alignment of the small subunit ribosomal RNA. Molecular and Biochemical Parasitology, 317–321 (1997)

    Google Scholar 

  25. Corredor, V., Enea, V.: Plasmodial ribosomal RNA as phylogenetic probe: a cautionary note. Mol. Biol. Evol., 924–926 (1993)

    Google Scholar 

  26. Leclerc, M.C., Hugot, J.P., Durand, P., Renaud, F.: Evolutionary relationships between 15 plasmodium species from new and old world primates (including humans): an 18s rDNA cladistic analysis. Parasitology, 677–684 (2004)

    Google Scholar 

  27. Cao, M.D., Dix, T.I., Allison, L.: Computing substitution matrices for genomic comparative analysis. In: PAKDD, pp. 647–655 (2009)

    Google Scholar 

  28. Siddall, M.E., Barta, J.R.: Phylogeny of plasmodium species: Estimation and inference. The Journal of Parasitology, 567–568 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cao, M.D., Allison, L., Dix, T. (2009). A Distance Measure for Genome Phylogenetic Analysis. In: Nicholson, A., Li, X. (eds) AI 2009: Advances in Artificial Intelligence. AI 2009. Lecture Notes in Computer Science(), vol 5866. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10439-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10439-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10438-1

  • Online ISBN: 978-3-642-10439-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics