Abstract
Phylogenetic methods must account for the biological processes that create incongruence between gene trees and the species phylogeny. Deep coalescence, or incomplete lineage sorting creates discord among gene trees at the early stages of species divergence or in cases when the time between speciation events was short and the ancestral population sizes were large. The deep coalescence problem takes a collection of gene trees and seeks the species tree that implies the fewest deep coalescence events, or the smallest deep coalescence reconciliation cost. Although this approach can to be useful for phylogenetics, the consensus properties of this problem are largely uncharacterized, and the accuracy of heuristics is untested. We prove that the deep coalescence consensus tree problem satisfies the highly desirable Pareto property for clusters (clades). That is, in all instances, each cluster that is present in all of the input gene trees, called a consensus cluster, will also be found in every optimal solution. We introduce an efficient algorithm that, given a candidate species tree that does not display the consensus clusters, will modify the candidate tree so that it includes all of the clusters and has a lower (more optimal) deep coalescence cost. Simulation experiments demonstrate the efficacy of this algorithm, but they also indicate that even with large trees, most solutions returned by the recent efficient heuristic display the consensus clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bansal, M., Burleigh, J.G., Eulenstein, O.: Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models. BMC Bioinformatics 11(Suppl 1), S42 (2010)
Bininda-Emonds, O.R.P.: Phylogenetic supertrees: combining information to reveal the Tree of Life. Springer, Heidelberg (2004)
Bryant, D.: A classification of consensus methods for phylogenies. In: BioConsensus, DIMACS, pp. 163–184. AMS, Providence (2003)
Edwards, S.V.: Is a new and general theory of molecular systematics emerging? Evolution; International Journal of Organic Evolution 63(1), 1–19 (2009)
Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28(2), 132–163 (1979)
Guigo, R., Muchnik, I., Smith, T.F.: Reconstruction of ancient molecular phylogeny. Mol. Phylogenet. Evol. 6(2), 189–213 (1996)
Heled, J., Drummond, A.J.: Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution 27(3), 570–580 (2010)
Knowles, L.L.: Estimating species trees: Methods of phylogenetic analysis when there is incongruence across genes. Systematic Biology 58(5), 463–467 (2009)
Kubatko, L.S., Carstens, B.C., Knowles, L.L.: STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7), 971–973 (2009)
Liu, L.: BEST: bayesian estimation of species trees under the coalescent model. Bioinformatics 24(21), 2542–2543 (2008)
Maddison, W.P.: Gene trees in species trees. Systematic Biology 46(3), 523–536 (1997)
Maddison, W.P., Knowles, L.L.: Inferring phylogeny despite incomplete lineage sorting. Systematic Biology 55(1), 21–30 (2006)
Maddison, W.P., Maddison, D.: Mesquite: a modular system for evolutionary analysis (2001), http://mesquiteproject.org
Pollard, D.A., Iyer, V.N., Moses, A.M., Eisen, M.B.: Widespread discordance of gene trees with species tree in drosophila: Evidence for incomplete lineage sorting. PLoS Genet. 2(10), e173 (2006)
Rokas, A., Williams, B.L., King, N., Carroll, S.B.: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425(6960), 798–804 (2003)
Sanderson, M.J.: r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19(2), 301–302 (2003)
Slowinski, J.B., Knight, A., Rooney, A.P.: Inferring species trees from gene trees: A phylogenetic analysis of the elapidae (Serpentes) based on the amino acid sequences of venom proteins. Molecular Phylogenetics and Evolution 8(3), 349–362 (1997)
Than, C., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLoS Computational Biology 5(9), e1000501 (2009)
Than, C.V., Rosenberg, N.A.: Consistency properties of species tree inference by minimizing deep coalescences. Journal of Computational Biology 18(1), 1–15 (2011)
Wilkinson, M., Cotton, J.A., Lapointe, F., Pisani, D.: Properties of supertree methods in the consensus setting. Systematic Biology 56(2), 330–337 (2007)
Wilkinson, M., Thorley, J., Pisani, D., Lapointe, F.-J., McInerney, J.: Some desiderata for liberal supertrees. In: Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, pp. 227–246. Springer, Dordrecht (2004)
Zhang, L.: From gene trees to species trees II: Species tree inference in the deep coalescence model. IEEE/ACM Trans. Comput. Biol. Bioinformatics (forthcoming, 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, H.T., Burleigh, J.G., Eulenstein, O. (2011). The Deep Coalescence Consensus Tree Problem is Pareto on Clusters. In: Chen, J., Wang, J., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2011. Lecture Notes in Computer Science(), vol 6674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21260-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-21260-4_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21259-8
Online ISBN: 978-3-642-21260-4
eBook Packages: Computer ScienceComputer Science (R0)