Skip to main content

Comparing Methods for Species Tree Estimation with Gene Duplication and Loss

  • Conference paper
  • First Online:
Algorithms for Computational Biology (AlCoB 2021)

Abstract

Species tree inference from gene trees is an important part of biological research. One confounding factor in estimating species trees is gene duplication and loss, which can lead to gene family trees with multiple copies of the same species. In recent years there have been several new methods developed to address this problem that have substantially improved on earlier methods; however, the best performing methods (ASTRAL-Pro, ASTRID-multi, and FastMulRFS) have not yet been directly compared. In this study, we compare ASTRAL-Pro, ASTRID-multi, and FastMulRFS under a wide variety of conditions. Our study shows that while all three have nearly the same accuracy under most conditions, ASTRAL-Pro and ASTRID-multi are more reliably accurate than FastMuLRFS (with a small advantage to ASTRID-multi), and that ASTRID-multi is often faster than ASTRAL-Pro.

J. Willson and M. S. Roddur—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bayzid, M.S., Mirarab, S., Warnow, T.: Inferring optimal species trees under gene duplication and loss. In: Biocomputing 2013, pp. 250–261. World Scientific (2013)

    Google Scholar 

  2. Boussau, B., Szöllősi, G.J., Duret, L., Gouy, M., Tannier, E., Daubin, V.: Genome-scale coestimation of species and gene trees. Genome Res. 23(2), 323–330 (2013)

    Article  Google Scholar 

  3. Chaudhary, R., Bansal, M.S., Wehe, A., Fernández-Baca, D., Eulenstein, O.: iGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinform. 11(1), 1–7 (2010)

    Article  Google Scholar 

  4. Chaudhary, R., Fernández-Baca, D., Burleigh, J.G.: MulRF: a software package for phylogenetic analysis using multi-copy gene trees. Bioinformatics 31(3), 432–433 (2015)

    Article  Google Scholar 

  5. Criscuolo, A., Gascuel, O.: Fast NJ-like algorithms to deal with incomplete distance matrices. BMC Bioinform. 9(1), 1–16 (2008). https://doi.org/10.1186/1471-2105-9-166

    Article  Google Scholar 

  6. Dittmar, K., Liberles, D.: Evolution After Gene Duplication. Wiley, Hoboken (2011)

    Google Scholar 

  7. Emms, D., Kelly, S.: STAG: species tree inference from all genes, p. 267914. BioRxiv (2018)

    Google Scholar 

  8. Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009)

    Article  Google Scholar 

  9. Glover, N., et al.: Advances and applications in the Quest for Orthologs. Mol. Biol. Evol. 36(10), 2157–2164 (2019)

    Article  Google Scholar 

  10. Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Biol. 28(2), 132–163 (1979)

    Article  Google Scholar 

  11. Innan, H., Kondrashov, F.: The evolution of gene duplications: classifying and distinguishing between models. Nat. Rev. Genet. 11(2), 97–108 (2010)

    Article  Google Scholar 

  12. Kim, A., Degnan, J.H.: PRANC: ML species tree estimation from the ranked gene trees under coalescence. Bioinformatics 36(18), 4819–4821 (2020)

    Article  Google Scholar 

  13. Kingman, J.F.C.: The coalescent. Stochast Process. Appl. 13(3), 235–248 (1982)

    Article  MathSciNet  Google Scholar 

  14. Lefort, V., Desper, R., Gascuel, O.: FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32(10), 2798–2800 (2015)

    Article  Google Scholar 

  15. Legried, B., Molloy, E.K., Warnow, T., Roch, S.: Polynomial-time statistical estimation of species trees under gene duplication and loss. In: Schwartz, R. (ed.) RECOMB 2020. LNCS, vol. 12074, pp. 120–135. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45257-5_8

    Chapter  Google Scholar 

  16. Liu, L., Wu, S., Yu, L.: Coalescent methods for estimating species trees from phylogenomic data. J. Syst. Evol. 53(5), 380–390 (2015)

    Article  Google Scholar 

  17. Mallo, D., de Oliveira Martins, L., Posada, D.: SimPhy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016)

    Article  Google Scholar 

  18. Molloy, E.K., Warnow, T.: FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models. Bioinformatics 36(Supplement_1), i57–i65 (2020)

    Google Scholar 

  19. Nute, M., Chou, J., Molloy, E.K., Warnow, T.: The performance of coalescent-based species tree estimation methods under models of missing data. BMC Genomics 19(5), 1–22 (2018)

    Google Scholar 

  20. de Oliveira Martins, L., Posada, D.: Species tree estimation from genome-wide data with Guenomu. In: Keith, J.M. (ed.) Bioinformatics. MMB, vol. 1525, pp. 461–478. Springer, New York (2017). https://doi.org/10.1007/978-1-4939-6622-6_18

    Chapter  Google Scholar 

  21. Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree 2-approximately maximum-likelihood trees for large alignments. PLoS ONE 5(3), e9490 (2010)

    Article  Google Scholar 

  22. Rabiee, M., Sayyari, E., Mirarab, S.: Multi-allele species reconstruction using ASTRAL. Mol. Phylogenet. Evol. 130, 286–296 (2019)

    Article  Google Scholar 

  23. Rasmussen, M.D., Kellis, M.: Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res. 22(4), 755–765 (2012)

    Article  Google Scholar 

  24. Richards, A., Kubatko, L.: Bayesian weighted triplet and quartet methods for species tree inference. arXiv (2020)

    Google Scholar 

  25. Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981)

    Article  MathSciNet  Google Scholar 

  26. Vachaspati, P.: ASTRID (2018–2021). https://github.com/pranjalv123/ASTRID

  27. Vachaspati, P., Warnow, T.: ASTRID: accurate species trees from internode distances. BMC Genomics 16(S10) (2015). Article number: S3. https://doi.org/10.1186/1471-2164-16-S10-S3

  28. Wehe, A., Bansal, M.S., Burleigh, J.G., Eulenstein, O.: DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13), 1540–1541 (2008)

    Article  Google Scholar 

  29. Zhang, C., Scornavacca, C., Molloy, E.K., Mirarab, S.: ASTRAL-Pro: quartet-based species-tree inference despite Paralogy. Mol. Biol. Evol. 37(11), 3292–3307 (2020)

    Article  Google Scholar 

Download references

Acknowledgments

This work made use of the Illinois Campus Cluster, a computing resource that is operated by the Illinois Campus Cluster Program (ICCP) in conjunction with the National Center for Supercomputing Applications (NCSA) and which is supported by funds from the University of Illinois at Urbana-Champaign.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tandy Warnow .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Willson, J., Roddur, M.S., Warnow, T. (2021). Comparing Methods for Species Tree Estimation with Gene Duplication and Loss. In: Martín-Vide, C., Vega-Rodríguez, M.A., Wheeler, T. (eds) Algorithms for Computational Biology. AlCoB 2021. Lecture Notes in Computer Science(), vol 12715. Springer, Cham. https://doi.org/10.1007/978-3-030-74432-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74432-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74431-1

  • Online ISBN: 978-3-030-74432-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics