Abstract
The problem of phylogenetic inference from datasets including incomplete characters is among the most relevant issues in systematic biology. In this paper, we propose a new probabilistic method for estimating unknown nucleotides before computing evolutionary distances. It is developed in the framework of the Tamura-Nei evolutionary model (Tamura and Nei (1993)). The proposed strategy is compared, through simulations, to existing methods “Ignoring Missing Sites” (IMS) and “Proportional Distribution of Missing and Ambiguous Bases” (PDMAB) included in the PAUP package (Swofford (2001)).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
DIALLO, Ab. B., DIALLO, Al. B. and MAKARENKOV, V. (2005): Une nouvelle mthode efficace pour l’estimation des données manquantes en vue de l’inférence phylogénétique. In: Proceeding of the 12th meeting of Société Francophone de Classification. Montréal, Canada, 121–125.
FELSENSTEIN, J. and CHURCHILL, G.A. (1996): A hidden Markov model approach to variation among sites in rate of evolution. Molecular Biology Evolution, 13, 93–104.
FELSENSTEIN, J. (1997): An alternating least squares approach to inferring phylogenies from pairwise distances. Systematic Biology, 46, 101–111.
GASCUEL, O. (1997): An improved version of NJ algorithm based on a simple model of sequence Data. Molecular Biology Evolution, 14, 685–695.
GUINDON, S. and GASCUEL, O. (2002): Efficient biased estimation of evolutionary distances when substitution rates vary across sites. Molecular Biology Evolution, 19, 534–543.
HUELSENBECK, J. P. (1991): When are fossils better than existent taxa in phylogenetic analysis? Systematic Zoology, 40, 458–469.
HASEGAWA, M., KISHINO, H. and YANO, T.(1985): Dating the humanape split by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution, 22, 160–174.
HUFFORD, L. (1992): Rosidaea and their relationships to other nonmagnoliid dicotyledons: A phylogenetic analysis using morphological and chemical data. Annals of the Missouri Botanical Garden, 79, 218–248.
JUKES, T. H. and CANTOR, C. (1969): Mammalian Protein Metabolism, chapter Evolution of protein molecules. Academic Press, New York, 21–132.
KIMURA, M. (1980): A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequence. Journal of Molecular Evolution, 16, 111–120.
KUHNER, M. and FELSENSTEIN. J.: A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Molecular Biology Evolution, 11, 459–468.
MAKARENKOV, V. and LECLERC, B. (1999): An algorithm for the fitting of a phylogenetic tree according to a weighted least-squares criterion. Journal of Classification, 16, 3–26.
MAKARENKOV, V. and LAPOINTE, F-J. (2004): A weighted least-squares approach for inferring phylogenies from incomplete distance matrices. Bioinformatics, 20, 2113–2121.
RAMBAULT, A. and GRASSLY, N. (1997): SeqGen: An application for the Monte Carlo simulation of DNA sequences evolution along phylogenetic trees. Bioinformatics, 13, 235–238.
ROBINSON, D. and FOULDS, L. (1981): Comparison of phylogenetic trees. Mathematical Biosciences, 53, 131–147.
SAITOU, N. and NEI, M.(1987): The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology Evolution, 4, 406–425.
SANDERSON, M.J., PURVIS, A. and HENZE, C. (1998): Phylogenetic supertrees: Assembing the tree of life. Trends in Ecology and Evolution, 13, 105–109.
SMITH, J.F.(1997): Tribal relationships within Gesneriaceae: A cladistic analysis of morphological data. Systematic Botanic, 21, 497–513.
SWOFFORD, D. L. (2001): PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts.
TAKAHASHI, K. and NEI, M. (2000): Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used. Molecular Biology and Evolution, 17, 1251–1258.
TAMURA, N. and NEI, M. (1993): Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10/3, 512–526.
WIENS, J. J. (1998): Missing data, incomplete taxa, and phylogenetic accuracy. Systematic Biology, 52, 528–538.
WIENS, J. J. (2003): Does adding characters with missing data increase or decrease phylogenetic accuracy. Systematic Biology, 47, 625–640.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Diallo, A.B., Makarenkov, V., Blanchette, M., Lapointe, FJ. (2006). A New Efficient Method for Assessing Missing Nucleotides in DNA Sequences in the Framework of a Generic Evolutionary Model. In: Batagelj, V., Bock, HH., Ferligoj, A., Žiberna, A. (eds) Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-34416-0_36
Download citation
DOI: https://doi.org/10.1007/3-540-34416-0_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34415-5
Online ISBN: 978-3-540-34416-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)