Abstract
Inferring species trees from multi-locus data needs to account for gene tree discordance due to various biological processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL). Gene tree parsimony (GTP) is a popular approach for estimating species trees that seeks to minimize the number of evolutionary events required to reconcile the species tree with gene trees. Minimizing gene duplication and loss (MGD and MGDL) are GTP approaches that typically make the simplifying assumption that population-related effects such as incomplete lineage sorting (ILS) are negligible. However, this assumption is problematic for denser phylogenies, where ILS is more prominent. Here, we extend the existing GTP methods to account for both GDL and ILS by minimizing a weighted sum of the GDL and “deep” coalescence events required for a given collection of gene trees. We provide a graph-theoretic characterization and present a dynamic programming algorithm for this problem. Through an extensive evaluation study on a collection of simulated and empirical datasets, we compared our proposed GTP approaches with the leading methods in the field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Data Availability Statement
DynaDup-DLX is freely available in open source form at https://github.com/prottoy99/DynaDup. All the datasets analyzed in this paper are from previously published studies and are publicly available.
References
Ansarifar, J., Markin, A., Górecki, P., Eulenstein, O.: Integer linear programming formulation for the unified duplication-loss-coalescence model. In: Cai, Z., Mandoiu, I., Narasimhan, G., Skums, P., Guo, X. (eds.) ISBRA 2020. LNCS, vol. 12304, pp. 229–242. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57821-3_20
Arvestad, L., Berglund, A.C., Lagergren, J., Sennblad, B.: Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinform. Oxford 19(1), 7–15 (2003)
Bayzid, M.S., Mirarab, S., Warnow, T.: Inferring optimal species trees under gene duplication and loss. In: Proceedings of Pacific Symposium on Biocomputing (PSB), vol. 18, pp. 250–261 (2013)
Bayzid, M.S., Warnow, T.: Gene tree parsimony for incomplete gene trees: addressing true biological loss. Algorithms Mol. Biol. 13, 1 (2018)
Bayzid, M.S.: Inferring optimal species trees in the presence of gene duplication and loss: beyond rooted gene trees. J. Comput. Biol. 30(2), 161–175 (2023)
Bayzid, M.S., Warnow, T.: Estimating optimal species trees from incomplete gene trees under deep coalescence. J. Comput. Biol. 19(6), 591–605 (2012)
Boussau, B., Szöllősi, G.J., Duret, L., Gouy, M., Tannier, E., Daubin, V.: Genome-scale coestimation of species and gene trees. Genome Res. 23(2), 323–330 (2013)
Butler, G., et al.: Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Nature 459, 657–662 (2009). https://doi.org/10.1038/nature08064
Chan, Y.B., Ranwez, V., Scornavacca, C.: Inferring incomplete lineage sorting, duplications, transfers and losses with reconciliations. J. Theoret. Biol. 432, 1–13 (2017)
Chaudhary, R., Bansal, M.S., Wehe, A., Fernández-Baca, D., Eulenstein, O.: iGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinform. 11, 1–7 (2010)
Chaudhary, R., Burleigh, J.G., Fernández-Baca, D.: Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance. Algorithms Mol. Biol. 8(1), 1–12 (2013). https://doi.org/10.1186/1748-7188-8-28
Chauve, C., Doyon, J.P., El-Mabrouk, N.: Gene family evolution by duplication, speciation, and loss. J. Comp. Biol. 15(8), 1043–1062 (2008)
Chung, Y., Ané, C.: Comparing two Bayesian methods for gene tree/species tree reconstruction: a simulation with incomplete lineage sorting and horizontal gene transfer. Syst. Biol. 60(3), 261–275 (2011)
De Oliveira Martins, L., Mallo, D., Posada, D.: A Bayesian supertree model for genome-wide species tree reconstruction. Syst. Biol. 65(3), 397–416 (2016)
Goodman, M., Czelusniak, J., Moore, G., Romero-Herrera, E., Matsuda, G.: Fitting the gene lineage into its species lineage: a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Zool. 28, 132–163 (1997)
Górecki, P.: Reconciliation problems for duplication, loss and horizontal gene transfer. In: Proceedings of 8th Annual International Conference on Computational Molecular Biology, pp. 316 – 325 (2004)
Guigo, R., Muchnik, I., Smith, T.F.: Reconstruction of ancient molecular phylogeny. Mol. Phylogenet. Evol. 6(2), 189–213 (1996)
Hallett, M.T., Lagergren, J.: New algorithms for the duplication-loss model. In: Proceedings of ACM Symposium on Computer Biology RECOMB2000, pp. 138–146. ACM Press, New York (2000)
Kingman, J.F.C.: The coalescent. Stoch. Processes Appl. 13(3), 235–248 (1982)
Ma, B., Li, M., Zhang, L.: From gene trees to species trees. SIAM J. Comput. 30(3), 729–752 (2000)
Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46, 523–536 (1997)
Mallo, D., de Oliveira Martins, L., Posada, D.: Simphy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016)
Mirarab, S., Bayzid, M.S., Boussau, B., Warnow, T.: Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 346(6215), 1250463 (2014)
Molloy, E.K., Warnow, T.: FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models. Bioinformatics 36(Supplement–1), i57–i65 (2020)
Rasmussen, M.D., Kellis, M.: Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res. 22(4), 755–765 (2012)
Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)
Rogers, J., Fishberg, A., Youngs, N., Wu, Y.C.: Reconciliation feasibility in the presence of gene duplication, loss, and coalescence with multiple individuals per species. BMC Bioinform. 18(1), 1–10 (2017)
Salichos, L., Rokas, A.: Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497, 327–331 (2013). https://doi.org/10.1038/nature12130
Sayyari, E., Mirarab, S.: Fast coalescent-based computation of local branch support from quartet frequencies. Mol. Biol. Evol. 33(7), 1654–1668 (2016)
Than, C.V., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLoS Comp. Biol. 5(9), e1000501 (2009)
Than, C.V., Rosenberg, N.A.: Consistency properties of species tree inference by minimizing deep coalescences. J. Comp. Biol. 18, 1–15 (2011)
Wehe, A., Bansal, M.S., Burleigh, J.G., Eulenstein, O.: Duptree: a program for large-scale phylogenetic analyses using gene tree parsimony. Am. J. Bot. 24(13), 1540–1541 (2008)
Wickett, N.J., et al.: Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl. Acad. Sci. 111(45), E4859–E4868 (2014)
Wu, Y.C., Rasmussen, M.D., Bansal, M.S., Kellis, M.: Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees. Genome Res. 24(3), 475–486 (2014)
Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18(11), 1543–1559 (2011)
Zhang, C., Scornavacca, C., Molloy, E.K., Mirarab, S.: ASTRAL-Pro: quartet-based species-tree inference despite paralogy. Mol. Biol. Evol. 37(11), 3292–3307 (2020). https://doi.org/10.1093/molbev/msaa139
Zhang, L.: On a Mirkin-Muchnik-Smith conjecture for comparing molecular phylogenies. J. Comput. Biol. 4(2), 177–188 (1997)
Zhang, L.: From gene trees to species trees II: species tree inference by minimizing deep coalescence events. IEEE/ACM Trans. Comput. Biol. Bioinf. 8(9), 1685–1691 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Saha, P. et al. (2024). Gene Tree Parsimony in the Presence of Gene Duplication, Loss, and Incomplete Lineage Sorting. In: Scornavacca, C., Hernández-Rosales, M. (eds) Comparative Genomics. RECOMB-CG 2024. Lecture Notes in Computer Science(), vol 14616. Springer, Cham. https://doi.org/10.1007/978-3-031-58072-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-58072-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58071-0
Online ISBN: 978-3-031-58072-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)