Abstract
Species tree inference from gene trees is an important part of biological research. One confounding factor in estimating species trees is gene duplication and loss, which can lead to gene family trees with multiple copies of the same species. In recent years there have been several new methods developed to address this problem that have substantially improved on earlier methods; however, the best performing methods (ASTRAL-Pro, ASTRID-multi, and FastMulRFS) have not yet been directly compared. In this study, we compare ASTRAL-Pro, ASTRID-multi, and FastMulRFS under a wide variety of conditions. Our study shows that while all three have nearly the same accuracy under most conditions, ASTRAL-Pro and ASTRID-multi are more reliably accurate than FastMuLRFS (with a small advantage to ASTRID-multi), and that ASTRID-multi is often faster than ASTRAL-Pro.
J. Willson and M. S. Roddur—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bayzid, M.S., Mirarab, S., Warnow, T.: Inferring optimal species trees under gene duplication and loss. In: Biocomputing 2013, pp. 250–261. World Scientific (2013)
Boussau, B., Szöllősi, G.J., Duret, L., Gouy, M., Tannier, E., Daubin, V.: Genome-scale coestimation of species and gene trees. Genome Res. 23(2), 323–330 (2013)
Chaudhary, R., Bansal, M.S., Wehe, A., Fernández-Baca, D., Eulenstein, O.: iGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinform. 11(1), 1–7 (2010)
Chaudhary, R., Fernández-Baca, D., Burleigh, J.G.: MulRF: a software package for phylogenetic analysis using multi-copy gene trees. Bioinformatics 31(3), 432–433 (2015)
Criscuolo, A., Gascuel, O.: Fast NJ-like algorithms to deal with incomplete distance matrices. BMC Bioinform. 9(1), 1–16 (2008). https://doi.org/10.1186/1471-2105-9-166
Dittmar, K., Liberles, D.: Evolution After Gene Duplication. Wiley, Hoboken (2011)
Emms, D., Kelly, S.: STAG: species tree inference from all genes, p. 267914. BioRxiv (2018)
Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009)
Glover, N., et al.: Advances and applications in the Quest for Orthologs. Mol. Biol. Evol. 36(10), 2157–2164 (2019)
Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Biol. 28(2), 132–163 (1979)
Innan, H., Kondrashov, F.: The evolution of gene duplications: classifying and distinguishing between models. Nat. Rev. Genet. 11(2), 97–108 (2010)
Kim, A., Degnan, J.H.: PRANC: ML species tree estimation from the ranked gene trees under coalescence. Bioinformatics 36(18), 4819–4821 (2020)
Kingman, J.F.C.: The coalescent. Stochast Process. Appl. 13(3), 235–248 (1982)
Lefort, V., Desper, R., Gascuel, O.: FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32(10), 2798–2800 (2015)
Legried, B., Molloy, E.K., Warnow, T., Roch, S.: Polynomial-time statistical estimation of species trees under gene duplication and loss. In: Schwartz, R. (ed.) RECOMB 2020. LNCS, vol. 12074, pp. 120–135. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45257-5_8
Liu, L., Wu, S., Yu, L.: Coalescent methods for estimating species trees from phylogenomic data. J. Syst. Evol. 53(5), 380–390 (2015)
Mallo, D., de Oliveira Martins, L., Posada, D.: SimPhy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016)
Molloy, E.K., Warnow, T.: FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models. Bioinformatics 36(Supplement_1), i57–i65 (2020)
Nute, M., Chou, J., Molloy, E.K., Warnow, T.: The performance of coalescent-based species tree estimation methods under models of missing data. BMC Genomics 19(5), 1–22 (2018)
de Oliveira Martins, L., Posada, D.: Species tree estimation from genome-wide data with Guenomu. In: Keith, J.M. (ed.) Bioinformatics. MMB, vol. 1525, pp. 461–478. Springer, New York (2017). https://doi.org/10.1007/978-1-4939-6622-6_18
Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree 2-approximately maximum-likelihood trees for large alignments. PLoS ONE 5(3), e9490 (2010)
Rabiee, M., Sayyari, E., Mirarab, S.: Multi-allele species reconstruction using ASTRAL. Mol. Phylogenet. Evol. 130, 286–296 (2019)
Rasmussen, M.D., Kellis, M.: Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res. 22(4), 755–765 (2012)
Richards, A., Kubatko, L.: Bayesian weighted triplet and quartet methods for species tree inference. arXiv (2020)
Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981)
Vachaspati, P.: ASTRID (2018–2021). https://github.com/pranjalv123/ASTRID
Vachaspati, P., Warnow, T.: ASTRID: accurate species trees from internode distances. BMC Genomics 16(S10) (2015). Article number: S3. https://doi.org/10.1186/1471-2164-16-S10-S3
Wehe, A., Bansal, M.S., Burleigh, J.G., Eulenstein, O.: DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13), 1540–1541 (2008)
Zhang, C., Scornavacca, C., Molloy, E.K., Mirarab, S.: ASTRAL-Pro: quartet-based species-tree inference despite Paralogy. Mol. Biol. Evol. 37(11), 3292–3307 (2020)
Acknowledgments
This work made use of the Illinois Campus Cluster, a computing resource that is operated by the Illinois Campus Cluster Program (ICCP) in conjunction with the National Center for Supercomputing Applications (NCSA) and which is supported by funds from the University of Illinois at Urbana-Champaign.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Willson, J., Roddur, M.S., Warnow, T. (2021). Comparing Methods for Species Tree Estimation with Gene Duplication and Loss. In: MartÃn-Vide, C., Vega-RodrÃguez, M.A., Wheeler, T. (eds) Algorithms for Computational Biology. AlCoB 2021. Lecture Notes in Computer Science(), vol 12715. Springer, Cham. https://doi.org/10.1007/978-3-030-74432-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-74432-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74431-1
Online ISBN: 978-3-030-74432-8
eBook Packages: Computer ScienceComputer Science (R0)