Skip to main content

Coestimation of Gene Trees and Reconciliations Under a Duplication-Loss-Coalescence Model

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10330))

Included in the following conference series:

Abstract

Accurate gene tree-species tree reconciliation is fundamental to understanding evolutionary processes across species. However, within eukaryotes, the most popular algorithms consider only a restricted set of evolutionary events, typically modeling only duplications and losses or only coalescences. Recent work has unified duplications, losses, and coalescences through an intermediate locus tree; however, the associated reconciliation algorithms assume that the gene tree is known and do not account for gene tree reconstruction error. Here, we demonstrate that independent reconstruction of the gene tree followed by reconciliation substantially degrades accuracy compared to using the true gene tree. To address this challenge, we present DLC-Coestimation, a Bayesian method that simultaneously reconstructs the gene tree and reconciles it with the species tree. We have applied our method on two clades of flies and fungi and demonstrate that it outperforms existing approaches in ortholog, duplication, and loss inference. This work demonstrates the utility of coestimation methods for inferences under joint phylogenetic and population genomic models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Zool. 28(2), 132–163 (1979)

    Article  Google Scholar 

  2. Page, R.D.M.: Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Syst. Biol. 43(1), 58–77 (1994)

    Google Scholar 

  3. Arvestad, L., Berglund, A.-C., Lagergren, J., Sennblad, B.: Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution. In: Proceedings of the Eighth Annual International Conference on Research in Computational Molecular Biology, RECOMB 2004, pp. 326–335. ACM, New York (2004)

    Google Scholar 

  4. Durand, D., Hallórsson, B.V., Vernot, B.: A hybrid micro-macroevolutionary approach to gene tree reconstruction. J. Comput. Biol. 13(2), 320–335 (2006)

    Article  MathSciNet  Google Scholar 

  5. Górecki, P., Tiuryn, J.: DLS-trees: a model of evolutionary scenarios. Theoret. Comput. Sci. 359(1–3), 378–399 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Li, H., Coghlan, A., Ruan, J., Coin, L.J., H’erich’e, J.-K., Osmotherly, L., Li, R., Liu, T., Zhang, Z., Bolund, L., Wong, G.K.-S., Zheng, W., Dehal, P., Wang, J., Durbin, R.: TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 34, 572–580 (2006)

    Article  Google Scholar 

  7. Hahn, M.: Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biol. 8(7), 141 (2007)

    Article  Google Scholar 

  8. Rasmussen, M.D., Kellis, M.: Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. Genome Res. 17(12), 1932–1942 (2007)

    Article  Google Scholar 

  9. Rasmussen, M.D., Kellis, M.: A Bayesian approach for fast and accurate gene tree reconstruction. Mol. Biol. Evol. 28(1), 273–290 (2011)

    Article  Google Scholar 

  10. Kingman, J.F.C.: The coalescent. Stoch. Proc. Appl. 13(3), 235–248 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  11. Pamilo, P., Nei, M.: Relationships between gene trees and species trees. Mol. Biol. Evol. 5(5), 568–583 (1988)

    Google Scholar 

  12. Takahata, N.: Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122(4), 957–966 (1989)

    Google Scholar 

  13. Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997)

    Article  Google Scholar 

  14. Rosenberg, N.A.: The probability of topological concordance of gene trees and species trees. Theor. Popul. Biol. 61(2), 225–247 (2002)

    Article  MATH  Google Scholar 

  15. Rannala, B., Yang, Z.: Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164(4), 1645–1656 (2003)

    Google Scholar 

  16. Degnan, J.H., Rosenberg, N.A.: Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24(6), 332–340 (2009)

    Article  Google Scholar 

  17. Wakeley, J.: Coalescent Theory: An Introduction. Roberts & Company Publishers, Greenwood Village (2009)

    Google Scholar 

  18. Heled, J., Drummond, A.J.: Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27(3), 570–580 (2010)

    Article  Google Scholar 

  19. Wu, Y.-C., Rasmussen, M.D., Bansal, M.S., Kellis, M.: Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees. Genome Res. 24(3), 475–486 (2014)

    Article  Google Scholar 

  20. Rasmussen, M.D., Kellis, M.: Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res. 22, 755–765 (2012)

    Article  Google Scholar 

  21. Delsuc, F., Brinkmann, H., Philippe, H.: Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6(5), 361–375 (2005)

    Article  Google Scholar 

  22. Burleigh, J.G., Bansal, M.S., Eulenstein, O., Hartmann, S., Wehe, A., Vision, T.J.: Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. Syst. Biol. 60(2), 117–125 (2011)

    Article  Google Scholar 

  23. Górecki, P., Eulenstein, O.: A linear time algorithm for error-corrected reconciliation of unrooted gene trees. In: Chen, J., Wang, J., Zelikovsky, A. (eds.) ISBRA 2011. LNCS, vol. 6674, pp. 148–159. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21260-4_17

    Chapter  Google Scholar 

  24. Wu, Y.-C., Rasmussen, M.D., Bansal, M.S., Kellis, M.: TreeFix: statistically informed gene tree error correction using species trees. Syst. Biol. 62(1), 110–120 (2013)

    Article  Google Scholar 

  25. Avise, J.C., Robinson, T.J.: Hemiplasy: a new term in the lexicon of phylogenetics. Syst. Biol. 57(3), 503–507 (2008)

    Article  Google Scholar 

  26. Dubb, L.: A likelihood model of gene family evolution. Ph.D. thesis, University of Washington, Seattle (2005)

    Google Scholar 

  27. Åkerborg, Ö., Sennblad, B., Arvestad, L., Lagergren, J.: Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc. Natl. Acad. Sci. U.S.A. 106(14), 5714–5719 (2009)

    Article  Google Scholar 

  28. Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, M.N. (ed.) Mammalian Protein Metabolism, vol. III, pp. 21–132. Academic Press, New York (1969)

    Chapter  Google Scholar 

  29. Hasegawa, M., Kishino, H., Yano, T.-A.: Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22(2), 160–174 (1985)

    Article  Google Scholar 

  30. Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci. 17, 57–86 (1986)

    MathSciNet  MATH  Google Scholar 

  31. Arvestad, L., Berglund, A.-C., Lagergren, J., Sennblad, B.: Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 19(Suppl. 1), 7–15 (2003)

    Article  Google Scholar 

  32. Arvestad, L., Lagergren, J., Sennblad, B.: The gene evolution model and computing its associated probabilities. J. ACM 56(2), 1–44 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  33. Felsenstein, J.: Inferring Phylogenies, 2nd edn. Sinauer Associates, Sunderland (2003)

    Google Scholar 

  34. Flouri, T., Izquierdo-Carrasco, F., Darriba, D., Aberer, A.J., Nguyen, L.-T., Minh, B.Q., Von Haeseler, A., Stamatakis, A.: The phylogenetic likelihood library. Syst. Biol. 64(2), 356–362 (2015)

    Article  Google Scholar 

  35. Doyon, J.-P., Chauve, C., Hamel, S.: An efficient method for exploring the space of gene tree/species tree reconciliations in a probabilistic framework. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(1), 26–39 (2012)

    Article  Google Scholar 

  36. Drosophila 12 Genomes Consortium: Evolution of genes and genomes on the Drosophila phylogeny. Nature 450(7167), 203–218 (2007)

    Google Scholar 

  37. Tamura, K., Subramanian, S., Kumar, S.: Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks. Mol. Biol. Evol. 21(1), 36–44 (2004)

    Article  Google Scholar 

  38. Hahn, M.W., Han, M.V., Han, S.-G.: Gene family evolution across 12 Drosophila genomes. PLoS Genet. 3(11), 197 (2007)

    Article  Google Scholar 

  39. Sawyer, S.A., Hartl, D.L.: Population genetics of polymorphism and divergence. Genetics 132(4), 1161–1176 (1992)

    Google Scholar 

  40. Pollard, D.A., Iyer, V.N., Moses, A.M., Eisen, M.B.: Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting. PLoS Genet. 2(10), 173 (2006)

    Article  Google Scholar 

  41. Charlesworth, B.: Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10, 195–205 (2009)

    Article  Google Scholar 

  42. Kimura, M.: Evolutionary rate at the molecular level. Nature 217(5129), 624–26 (1968)

    Article  Google Scholar 

  43. Haag-Liautard, C., Dorris, M., Maside, X., Macaskill, S., Halligan, D.L., Charlesworth, B., Keightley, P.D.: Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila. Nature 445(7123), 82–85 (2007)

    Article  Google Scholar 

  44. Rambaut, A., Grassly, N.C.: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997)

    Google Scholar 

  45. Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)

    Article  Google Scholar 

  46. Bork, D., Cheng, R., Wang, J., Sung, J., Libeskind-Hadas, R.: On the computational complexity of the maximum parsimony reconciliation problem in the duplication-loss-coalescence model. Algorithm Mol. Biol. 12(6) (2017). https://almob.biomedcentral.com/articles/10.1186/s13015-017-0098-8

  47. Butler, G., Rasmussen, M.D., Lin, M.F., Santos, M.A.S., Sakthikumar, S., Munro, C.A., Rheinbay, E., Grabherr, M., Forche, A., Reedy, J.L., Agrafioti, I., Arnaud, M.B., Bates, S., Brown, A.J.P., Brunke, S., Costanzo, M.C., Fitzpatrick, D.A., de Groot, P.W.J., Harris, D., Hoyer, L.L., Hube, B., Klis, F.M., Kodira, C., Lennard, N., Logue, M.E., Martin, R., Neiman, A.M., Nikolaou, E., Quail, M.A., Quinn, J., Santos, M.C., Schmitzberger, F.F., Sherlock, G., Shah, P., Silverstein, K.A.T., Skrzypek, M.S., Soll, D., Staggs, R., Stansfield, I., Stumpf, M.P.H., Sudbery, P.E., Srikantha, T., Zeng, Q., Berman, J., Berriman, M., Heitman, J., Gow, N.A.R., Lorenz, M.C., Birren, B.W., Kellis, M., Cuomo, C.A.: Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Nature 459(7247), 657–662 (2009)

    Article  Google Scholar 

  48. Wapinski, I., Pfeffer, A., Friedman, N., Regev, A.: Natural history and evolutionary principles of gene duplication in fungi. Nature 449(7158), 54–61 (2007)

    Article  Google Scholar 

  49. Lynch, M., Sung, W., Morris, K., Coffey, N., Landry, C.R., Dopman, E.B., Dickinson, W.J., Okamoto, K., Kulkarni, S., Hartl, D.L., Thomas, W.K.: A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc. Natl. Acad. Sci. U.S.A. 105(27), 9272–9277 (2008)

    Article  Google Scholar 

  50. Vilella, A.J., Severin, J., Ureta-Vidal, A., Heng, L., Durbin, R., Birney, E.: EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 19(2), 327–335 (2009)

    Article  Google Scholar 

  51. Hahn, M.W., De Bie, T., Stajich, J.E., Nguyen, C., Cristianini, N.: Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Res. 15(8), 1153–1160 (2005)

    Article  Google Scholar 

  52. Boussau, B., Szöllősi, G.J., Duret, L., Gouy, M., Tannier, E., Daubin, V.: Genome-scale coestimation of species and gene trees. Genome Res. 23(2), 323–330 (2013)

    Article  Google Scholar 

  53. Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324(5934), 1561–1564 (2009)

    Article  Google Scholar 

Download references

Acknowledgments

We thank Matthew D. Rasmussen, Ran Libeskind-Hadas, and Mark Huber for helpful comments, feedback, and discussions. This work was supported by funds from the Department of Computer Science and the Dean of Faculty of Harvey Mudd College.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Chieh Wu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 204 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhang, B., Wu, YC. (2017). Coestimation of Gene Trees and Reconciliations Under a Duplication-Loss-Coalescence Model. In: Cai, Z., Daescu, O., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2017. Lecture Notes in Computer Science(), vol 10330. Springer, Cham. https://doi.org/10.1007/978-3-319-59575-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59575-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59574-0

  • Online ISBN: 978-3-319-59575-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics