Skip to main content

Gene Tree Parsimony in the Presence of Gene Duplication, Loss, and Incomplete Lineage Sorting

  • Conference paper
  • First Online:
Comparative Genomics (RECOMB-CG 2024)

Abstract

Inferring species trees from multi-locus data needs to account for gene tree discordance due to various biological processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL). Gene tree parsimony (GTP) is a popular approach for estimating species trees that seeks to minimize the number of evolutionary events required to reconcile the species tree with gene trees. Minimizing gene duplication and loss (MGD and MGDL) are GTP approaches that typically make the simplifying assumption that population-related effects such as incomplete lineage sorting (ILS) are negligible. However, this assumption is problematic for denser phylogenies, where ILS is more prominent. Here, we extend the existing GTP methods to account for both GDL and ILS by minimizing a weighted sum of the GDL and “deep” coalescence events required for a given collection of gene trees. We provide a graph-theoretic characterization and present a dynamic programming algorithm for this problem. Through an extensive evaluation study on a collection of simulated and empirical datasets, we compared our proposed GTP approaches with the leading methods in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data Availability Statement

DynaDup-DLX is freely available in open source form at https://github.com/prottoy99/DynaDup. All the datasets analyzed in this paper are from previously published studies and are publicly available.

References

  1. Ansarifar, J., Markin, A., Górecki, P., Eulenstein, O.: Integer linear programming formulation for the unified duplication-loss-coalescence model. In: Cai, Z., Mandoiu, I., Narasimhan, G., Skums, P., Guo, X. (eds.) ISBRA 2020. LNCS, vol. 12304, pp. 229–242. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57821-3_20

    Chapter  Google Scholar 

  2. Arvestad, L., Berglund, A.C., Lagergren, J., Sennblad, B.: Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinform. Oxford 19(1), 7–15 (2003)

    Article  Google Scholar 

  3. Bayzid, M.S., Mirarab, S., Warnow, T.: Inferring optimal species trees under gene duplication and loss. In: Proceedings of Pacific Symposium on Biocomputing (PSB), vol. 18, pp. 250–261 (2013)

    Google Scholar 

  4. Bayzid, M.S., Warnow, T.: Gene tree parsimony for incomplete gene trees: addressing true biological loss. Algorithms Mol. Biol. 13, 1 (2018)

    Article  Google Scholar 

  5. Bayzid, M.S.: Inferring optimal species trees in the presence of gene duplication and loss: beyond rooted gene trees. J. Comput. Biol. 30(2), 161–175 (2023)

    Article  MathSciNet  Google Scholar 

  6. Bayzid, M.S., Warnow, T.: Estimating optimal species trees from incomplete gene trees under deep coalescence. J. Comput. Biol. 19(6), 591–605 (2012)

    Article  MathSciNet  Google Scholar 

  7. Boussau, B., Szöllősi, G.J., Duret, L., Gouy, M., Tannier, E., Daubin, V.: Genome-scale coestimation of species and gene trees. Genome Res. 23(2), 323–330 (2013)

    Article  Google Scholar 

  8. Butler, G., et al.: Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Nature 459, 657–662 (2009). https://doi.org/10.1038/nature08064

    Article  Google Scholar 

  9. Chan, Y.B., Ranwez, V., Scornavacca, C.: Inferring incomplete lineage sorting, duplications, transfers and losses with reconciliations. J. Theoret. Biol. 432, 1–13 (2017)

    Article  MathSciNet  Google Scholar 

  10. Chaudhary, R., Bansal, M.S., Wehe, A., Fernández-Baca, D., Eulenstein, O.: iGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinform. 11, 1–7 (2010)

    Article  Google Scholar 

  11. Chaudhary, R., Burleigh, J.G., Fernández-Baca, D.: Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance. Algorithms Mol. Biol. 8(1), 1–12 (2013). https://doi.org/10.1186/1748-7188-8-28

    Article  Google Scholar 

  12. Chauve, C., Doyon, J.P., El-Mabrouk, N.: Gene family evolution by duplication, speciation, and loss. J. Comp. Biol. 15(8), 1043–1062 (2008)

    Article  MathSciNet  Google Scholar 

  13. Chung, Y., Ané, C.: Comparing two Bayesian methods for gene tree/species tree reconstruction: a simulation with incomplete lineage sorting and horizontal gene transfer. Syst. Biol. 60(3), 261–275 (2011)

    Article  Google Scholar 

  14. De Oliveira Martins, L., Mallo, D., Posada, D.: A Bayesian supertree model for genome-wide species tree reconstruction. Syst. Biol. 65(3), 397–416 (2016)

    Article  Google Scholar 

  15. Goodman, M., Czelusniak, J., Moore, G., Romero-Herrera, E., Matsuda, G.: Fitting the gene lineage into its species lineage: a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Zool. 28, 132–163 (1997)

    Article  Google Scholar 

  16. Górecki, P.: Reconciliation problems for duplication, loss and horizontal gene transfer. In: Proceedings of 8th Annual International Conference on Computational Molecular Biology, pp. 316 – 325 (2004)

    Google Scholar 

  17. Guigo, R., Muchnik, I., Smith, T.F.: Reconstruction of ancient molecular phylogeny. Mol. Phylogenet. Evol. 6(2), 189–213 (1996)

    Article  Google Scholar 

  18. Hallett, M.T., Lagergren, J.: New algorithms for the duplication-loss model. In: Proceedings of ACM Symposium on Computer Biology RECOMB2000, pp. 138–146. ACM Press, New York (2000)

    Google Scholar 

  19. Kingman, J.F.C.: The coalescent. Stoch. Processes Appl. 13(3), 235–248 (1982)

    Article  MathSciNet  Google Scholar 

  20. Ma, B., Li, M., Zhang, L.: From gene trees to species trees. SIAM J. Comput. 30(3), 729–752 (2000)

    Article  MathSciNet  Google Scholar 

  21. Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46, 523–536 (1997)

    Article  Google Scholar 

  22. Mallo, D., de Oliveira Martins, L., Posada, D.: Simphy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016)

    Article  Google Scholar 

  23. Mirarab, S., Bayzid, M.S., Boussau, B., Warnow, T.: Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 346(6215), 1250463 (2014)

    Article  Google Scholar 

  24. Molloy, E.K., Warnow, T.: FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models. Bioinformatics 36(Supplement–1), i57–i65 (2020)

    Article  Google Scholar 

  25. Rasmussen, M.D., Kellis, M.: Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res. 22(4), 755–765 (2012)

    Article  Google Scholar 

  26. Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53, 131–147 (1981)

    Article  MathSciNet  Google Scholar 

  27. Rogers, J., Fishberg, A., Youngs, N., Wu, Y.C.: Reconciliation feasibility in the presence of gene duplication, loss, and coalescence with multiple individuals per species. BMC Bioinform. 18(1), 1–10 (2017)

    Article  Google Scholar 

  28. Salichos, L., Rokas, A.: Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497, 327–331 (2013). https://doi.org/10.1038/nature12130

    Article  Google Scholar 

  29. Sayyari, E., Mirarab, S.: Fast coalescent-based computation of local branch support from quartet frequencies. Mol. Biol. Evol. 33(7), 1654–1668 (2016)

    Article  Google Scholar 

  30. Than, C.V., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLoS Comp. Biol. 5(9), e1000501 (2009)

    Article  MathSciNet  Google Scholar 

  31. Than, C.V., Rosenberg, N.A.: Consistency properties of species tree inference by minimizing deep coalescences. J. Comp. Biol. 18, 1–15 (2011)

    Article  MathSciNet  Google Scholar 

  32. Wehe, A., Bansal, M.S., Burleigh, J.G., Eulenstein, O.: Duptree: a program for large-scale phylogenetic analyses using gene tree parsimony. Am. J. Bot. 24(13), 1540–1541 (2008)

    Google Scholar 

  33. Wickett, N.J., et al.: Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl. Acad. Sci. 111(45), E4859–E4868 (2014)

    Article  Google Scholar 

  34. Wu, Y.C., Rasmussen, M.D., Bansal, M.S., Kellis, M.: Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees. Genome Res. 24(3), 475–486 (2014)

    Article  Google Scholar 

  35. Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18(11), 1543–1559 (2011)

    Article  MathSciNet  Google Scholar 

  36. Zhang, C., Scornavacca, C., Molloy, E.K., Mirarab, S.: ASTRAL-Pro: quartet-based species-tree inference despite paralogy. Mol. Biol. Evol. 37(11), 3292–3307 (2020). https://doi.org/10.1093/molbev/msaa139

    Article  Google Scholar 

  37. Zhang, L.: On a Mirkin-Muchnik-Smith conjecture for comparing molecular phylogenies. J. Comput. Biol. 4(2), 177–188 (1997)

    Article  Google Scholar 

  38. Zhang, L.: From gene trees to species trees II: species tree inference by minimizing deep coalescence events. IEEE/ACM Trans. Comput. Biol. Bioinf. 8(9), 1685–1691 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Shamsuzzoha Bayzid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saha, P. et al. (2024). Gene Tree Parsimony in the Presence of Gene Duplication, Loss, and Incomplete Lineage Sorting. In: Scornavacca, C., Hernández-Rosales, M. (eds) Comparative Genomics. RECOMB-CG 2024. Lecture Notes in Computer Science(), vol 14616. Springer, Cham. https://doi.org/10.1007/978-3-031-58072-7_6

Download citation

Publish with us

Policies and ethics