Abstract
Accurate gene tree-species tree reconciliation is fundamental to understanding evolutionary processes across species. However, within eukaryotes, the most popular algorithms consider only a restricted set of evolutionary events, typically modeling only duplications and losses or only coalescences. Recent work has unified duplications, losses, and coalescences through an intermediate locus tree; however, the associated reconciliation algorithms assume that the gene tree is known and do not account for gene tree reconstruction error. Here, we demonstrate that independent reconstruction of the gene tree followed by reconciliation substantially degrades accuracy compared to using the true gene tree. To address this challenge, we present DLC-Coestimation, a Bayesian method that simultaneously reconstructs the gene tree and reconciles it with the species tree. We have applied our method on two clades of flies and fungi and demonstrate that it outperforms existing approaches in ortholog, duplication, and loss inference. This work demonstrates the utility of coestimation methods for inferences under joint phylogenetic and population genomic models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Zool. 28(2), 132–163 (1979)
Page, R.D.M.: Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Syst. Biol. 43(1), 58–77 (1994)
Arvestad, L., Berglund, A.-C., Lagergren, J., Sennblad, B.: Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution. In: Proceedings of the Eighth Annual International Conference on Research in Computational Molecular Biology, RECOMB 2004, pp. 326–335. ACM, New York (2004)
Durand, D., Hallórsson, B.V., Vernot, B.: A hybrid micro-macroevolutionary approach to gene tree reconstruction. J. Comput. Biol. 13(2), 320–335 (2006)
Górecki, P., Tiuryn, J.: DLS-trees: a model of evolutionary scenarios. Theoret. Comput. Sci. 359(1–3), 378–399 (2006)
Li, H., Coghlan, A., Ruan, J., Coin, L.J., H’erich’e, J.-K., Osmotherly, L., Li, R., Liu, T., Zhang, Z., Bolund, L., Wong, G.K.-S., Zheng, W., Dehal, P., Wang, J., Durbin, R.: TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 34, 572–580 (2006)
Hahn, M.: Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biol. 8(7), 141 (2007)
Rasmussen, M.D., Kellis, M.: Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. Genome Res. 17(12), 1932–1942 (2007)
Rasmussen, M.D., Kellis, M.: A Bayesian approach for fast and accurate gene tree reconstruction. Mol. Biol. Evol. 28(1), 273–290 (2011)
Kingman, J.F.C.: The coalescent. Stoch. Proc. Appl. 13(3), 235–248 (1982)
Pamilo, P., Nei, M.: Relationships between gene trees and species trees. Mol. Biol. Evol. 5(5), 568–583 (1988)
Takahata, N.: Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122(4), 957–966 (1989)
Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997)
Rosenberg, N.A.: The probability of topological concordance of gene trees and species trees. Theor. Popul. Biol. 61(2), 225–247 (2002)
Rannala, B., Yang, Z.: Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164(4), 1645–1656 (2003)
Degnan, J.H., Rosenberg, N.A.: Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24(6), 332–340 (2009)
Wakeley, J.: Coalescent Theory: An Introduction. Roberts & Company Publishers, Greenwood Village (2009)
Heled, J., Drummond, A.J.: Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27(3), 570–580 (2010)
Wu, Y.-C., Rasmussen, M.D., Bansal, M.S., Kellis, M.: Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees. Genome Res. 24(3), 475–486 (2014)
Rasmussen, M.D., Kellis, M.: Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res. 22, 755–765 (2012)
Delsuc, F., Brinkmann, H., Philippe, H.: Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6(5), 361–375 (2005)
Burleigh, J.G., Bansal, M.S., Eulenstein, O., Hartmann, S., Wehe, A., Vision, T.J.: Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. Syst. Biol. 60(2), 117–125 (2011)
Górecki, P., Eulenstein, O.: A linear time algorithm for error-corrected reconciliation of unrooted gene trees. In: Chen, J., Wang, J., Zelikovsky, A. (eds.) ISBRA 2011. LNCS, vol. 6674, pp. 148–159. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21260-4_17
Wu, Y.-C., Rasmussen, M.D., Bansal, M.S., Kellis, M.: TreeFix: statistically informed gene tree error correction using species trees. Syst. Biol. 62(1), 110–120 (2013)
Avise, J.C., Robinson, T.J.: Hemiplasy: a new term in the lexicon of phylogenetics. Syst. Biol. 57(3), 503–507 (2008)
Dubb, L.: A likelihood model of gene family evolution. Ph.D. thesis, University of Washington, Seattle (2005)
Åkerborg, Ö., Sennblad, B., Arvestad, L., Lagergren, J.: Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc. Natl. Acad. Sci. U.S.A. 106(14), 5714–5719 (2009)
Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, M.N. (ed.) Mammalian Protein Metabolism, vol. III, pp. 21–132. Academic Press, New York (1969)
Hasegawa, M., Kishino, H., Yano, T.-A.: Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22(2), 160–174 (1985)
Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci. 17, 57–86 (1986)
Arvestad, L., Berglund, A.-C., Lagergren, J., Sennblad, B.: Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 19(Suppl. 1), 7–15 (2003)
Arvestad, L., Lagergren, J., Sennblad, B.: The gene evolution model and computing its associated probabilities. J. ACM 56(2), 1–44 (2009)
Felsenstein, J.: Inferring Phylogenies, 2nd edn. Sinauer Associates, Sunderland (2003)
Flouri, T., Izquierdo-Carrasco, F., Darriba, D., Aberer, A.J., Nguyen, L.-T., Minh, B.Q., Von Haeseler, A., Stamatakis, A.: The phylogenetic likelihood library. Syst. Biol. 64(2), 356–362 (2015)
Doyon, J.-P., Chauve, C., Hamel, S.: An efficient method for exploring the space of gene tree/species tree reconciliations in a probabilistic framework. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(1), 26–39 (2012)
Drosophila 12 Genomes Consortium: Evolution of genes and genomes on the Drosophila phylogeny. Nature 450(7167), 203–218 (2007)
Tamura, K., Subramanian, S., Kumar, S.: Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks. Mol. Biol. Evol. 21(1), 36–44 (2004)
Hahn, M.W., Han, M.V., Han, S.-G.: Gene family evolution across 12 Drosophila genomes. PLoS Genet. 3(11), 197 (2007)
Sawyer, S.A., Hartl, D.L.: Population genetics of polymorphism and divergence. Genetics 132(4), 1161–1176 (1992)
Pollard, D.A., Iyer, V.N., Moses, A.M., Eisen, M.B.: Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting. PLoS Genet. 2(10), 173 (2006)
Charlesworth, B.: Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10, 195–205 (2009)
Kimura, M.: Evolutionary rate at the molecular level. Nature 217(5129), 624–26 (1968)
Haag-Liautard, C., Dorris, M., Maside, X., Macaskill, S., Halligan, D.L., Charlesworth, B., Keightley, P.D.: Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila. Nature 445(7123), 82–85 (2007)
Rambaut, A., Grassly, N.C.: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997)
Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)
Bork, D., Cheng, R., Wang, J., Sung, J., Libeskind-Hadas, R.: On the computational complexity of the maximum parsimony reconciliation problem in the duplication-loss-coalescence model. Algorithm Mol. Biol. 12(6) (2017). https://almob.biomedcentral.com/articles/10.1186/s13015-017-0098-8
Butler, G., Rasmussen, M.D., Lin, M.F., Santos, M.A.S., Sakthikumar, S., Munro, C.A., Rheinbay, E., Grabherr, M., Forche, A., Reedy, J.L., Agrafioti, I., Arnaud, M.B., Bates, S., Brown, A.J.P., Brunke, S., Costanzo, M.C., Fitzpatrick, D.A., de Groot, P.W.J., Harris, D., Hoyer, L.L., Hube, B., Klis, F.M., Kodira, C., Lennard, N., Logue, M.E., Martin, R., Neiman, A.M., Nikolaou, E., Quail, M.A., Quinn, J., Santos, M.C., Schmitzberger, F.F., Sherlock, G., Shah, P., Silverstein, K.A.T., Skrzypek, M.S., Soll, D., Staggs, R., Stansfield, I., Stumpf, M.P.H., Sudbery, P.E., Srikantha, T., Zeng, Q., Berman, J., Berriman, M., Heitman, J., Gow, N.A.R., Lorenz, M.C., Birren, B.W., Kellis, M., Cuomo, C.A.: Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Nature 459(7247), 657–662 (2009)
Wapinski, I., Pfeffer, A., Friedman, N., Regev, A.: Natural history and evolutionary principles of gene duplication in fungi. Nature 449(7158), 54–61 (2007)
Lynch, M., Sung, W., Morris, K., Coffey, N., Landry, C.R., Dopman, E.B., Dickinson, W.J., Okamoto, K., Kulkarni, S., Hartl, D.L., Thomas, W.K.: A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc. Natl. Acad. Sci. U.S.A. 105(27), 9272–9277 (2008)
Vilella, A.J., Severin, J., Ureta-Vidal, A., Heng, L., Durbin, R., Birney, E.: EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 19(2), 327–335 (2009)
Hahn, M.W., De Bie, T., Stajich, J.E., Nguyen, C., Cristianini, N.: Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Res. 15(8), 1153–1160 (2005)
Boussau, B., Szöllősi, G.J., Duret, L., Gouy, M., Tannier, E., Daubin, V.: Genome-scale coestimation of species and gene trees. Genome Res. 23(2), 323–330 (2013)
Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324(5934), 1561–1564 (2009)
Acknowledgments
We thank Matthew D. Rasmussen, Ran Libeskind-Hadas, and Mark Huber for helpful comments, feedback, and discussions. This work was supported by funds from the Department of Computer Science and the Dean of Faculty of Harvey Mudd College.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhang, B., Wu, YC. (2017). Coestimation of Gene Trees and Reconciliations Under a Duplication-Loss-Coalescence Model. In: Cai, Z., Daescu, O., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2017. Lecture Notes in Computer Science(), vol 10330. Springer, Cham. https://doi.org/10.1007/978-3-319-59575-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-59575-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59574-0
Online ISBN: 978-3-319-59575-7
eBook Packages: Computer ScienceComputer Science (R0)