Skip to main content

A New Efficient Method for Assessing Missing Nucleotides in DNA Sequences in the Framework of a Generic Evolutionary Model

  • Conference paper
Data Science and Classification

Abstract

The problem of phylogenetic inference from datasets including incomplete characters is among the most relevant issues in systematic biology. In this paper, we propose a new probabilistic method for estimating unknown nucleotides before computing evolutionary distances. It is developed in the framework of the Tamura-Nei evolutionary model (Tamura and Nei (1993)). The proposed strategy is compared, through simulations, to existing methods “Ignoring Missing Sites” (IMS) and “Proportional Distribution of Missing and Ambiguous Bases” (PDMAB) included in the PAUP package (Swofford (2001)).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • DIALLO, Ab. B., DIALLO, Al. B. and MAKARENKOV, V. (2005): Une nouvelle mthode efficace pour l’estimation des données manquantes en vue de l’inférence phylogénétique. In: Proceeding of the 12th meeting of Société Francophone de Classification. Montréal, Canada, 121–125.

    Google Scholar 

  • FELSENSTEIN, J. and CHURCHILL, G.A. (1996): A hidden Markov model approach to variation among sites in rate of evolution. Molecular Biology Evolution, 13, 93–104.

    Google Scholar 

  • FELSENSTEIN, J. (1997): An alternating least squares approach to inferring phylogenies from pairwise distances. Systematic Biology, 46, 101–111.

    Article  Google Scholar 

  • GASCUEL, O. (1997): An improved version of NJ algorithm based on a simple model of sequence Data. Molecular Biology Evolution, 14, 685–695.

    Google Scholar 

  • GUINDON, S. and GASCUEL, O. (2002): Efficient biased estimation of evolutionary distances when substitution rates vary across sites. Molecular Biology Evolution, 19, 534–543.

    Google Scholar 

  • HUELSENBECK, J. P. (1991): When are fossils better than existent taxa in phylogenetic analysis? Systematic Zoology, 40, 458–469.

    Article  Google Scholar 

  • HASEGAWA, M., KISHINO, H. and YANO, T.(1985): Dating the humanape split by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution, 22, 160–174.

    Article  Google Scholar 

  • HUFFORD, L. (1992): Rosidaea and their relationships to other nonmagnoliid dicotyledons: A phylogenetic analysis using morphological and chemical data. Annals of the Missouri Botanical Garden, 79, 218–248.

    Article  Google Scholar 

  • JUKES, T. H. and CANTOR, C. (1969): Mammalian Protein Metabolism, chapter Evolution of protein molecules. Academic Press, New York, 21–132.

    Google Scholar 

  • KIMURA, M. (1980): A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequence. Journal of Molecular Evolution, 16, 111–120.

    Article  Google Scholar 

  • KUHNER, M. and FELSENSTEIN. J.: A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Molecular Biology Evolution, 11, 459–468.

    Google Scholar 

  • MAKARENKOV, V. and LECLERC, B. (1999): An algorithm for the fitting of a phylogenetic tree according to a weighted least-squares criterion. Journal of Classification, 16, 3–26.

    Article  MATH  MathSciNet  Google Scholar 

  • MAKARENKOV, V. and LAPOINTE, F-J. (2004): A weighted least-squares approach for inferring phylogenies from incomplete distance matrices. Bioinformatics, 20, 2113–2121.

    Article  Google Scholar 

  • RAMBAULT, A. and GRASSLY, N. (1997): SeqGen: An application for the Monte Carlo simulation of DNA sequences evolution along phylogenetic trees. Bioinformatics, 13, 235–238.

    Google Scholar 

  • ROBINSON, D. and FOULDS, L. (1981): Comparison of phylogenetic trees. Mathematical Biosciences, 53, 131–147.

    Article  MATH  MathSciNet  Google Scholar 

  • SAITOU, N. and NEI, M.(1987): The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology Evolution, 4, 406–425.

    Google Scholar 

  • SANDERSON, M.J., PURVIS, A. and HENZE, C. (1998): Phylogenetic supertrees: Assembing the tree of life. Trends in Ecology and Evolution, 13, 105–109.

    Article  Google Scholar 

  • SMITH, J.F.(1997): Tribal relationships within Gesneriaceae: A cladistic analysis of morphological data. Systematic Botanic, 21, 497–513.

    Google Scholar 

  • SWOFFORD, D. L. (2001): PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts.

    Google Scholar 

  • TAKAHASHI, K. and NEI, M. (2000): Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. Molecular Biology and Evolution, 17, 1251–1258.

    Google Scholar 

  • TAMURA, N. and NEI, M. (1993): Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10/3, 512–526.

    Google Scholar 

  • WIENS, J. J. (1998): Missing data, incomplete taxa, and phylogenetic accuracy. Systematic Biology, 52, 528–538.

    Google Scholar 

  • WIENS, J. J. (2003): Does adding characters with missing data increase or decrease phylogenetic accuracy. Systematic Biology, 47, 625–640.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Diallo, A.B., Makarenkov, V., Blanchette, M., Lapointe, FJ. (2006). A New Efficient Method for Assessing Missing Nucleotides in DNA Sequences in the Framework of a Generic Evolutionary Model. In: Batagelj, V., Bock, HH., Ferligoj, A., Žiberna, A. (eds) Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-34416-0_36

Download citation

Publish with us

Policies and ethics