Skip to main content

Fast and Accurate Branch Support Calculation for Distance-Based Phylogenetic Placements

  • Conference paper
  • First Online:
Book cover Comparative Genomics (RECOMB-CG 2022)

Abstract

Placing a new sequence onto an existing phylogenetic tree is increasingly used in downstream applications ranging from microbiome analyses to epidemic tracking. Most such applications deal with noisy data, incomplete references, and model misspecifications, all of which make the correct placement uncertain. While recent placement methods have increasingly enabled placement on ultra-large backbone trees with tens to hundreds of thousands of species, they have mostly ignored the issue of uncertainty. Here, we build on the recently developed distance-based phylogenetic placement methodology and show how the distribution of placements can be estimated per input sequence. We compare parametric and non-parametric sampling methods, showing that non-parametric bootstrapping is far more accurate in estimating uncertainty. Finally, we design and implement a linear algebraic implementation of bootstrapping that makes it faster, and we incorporate the computation of support values as a new feature in the APPLES software.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anisimova, M., Gascuel, O., Sullivan, J.: Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst. Biol. 55(4), 539–552 (2006). https://doi.org/10.1080/10635150600755453

    Article  Google Scholar 

  2. Asnicar, F., et al.: Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nat. Commun. 11(1), 2500 (2020). https://doi.org/10.1038/s41467-020-16366-7. http://www.nature.com/articles/s41467-020-16366-7

  3. Balaban, M., Jiang, Y., Roush, D., Zhu, Q., Mirarab, S.: Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol. Ecol. Resour. (2021). https://doi.org/10.1111/1755-0998.13527. https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.13527

  4. Balaban, M., Mirarab, S.: Phylogenetic double placement of mixed samples. Bioinformatics 36(Supplement_1), i335–i343 (2020). https://doi.org/10.1093/bioinformatics/btaa489. https://academic.oup.com/bioinformatics/article/36/Supplement_1/i335/5870522

  5. Balaban, M., Sarmashghi, S., Mirarab, S.: APPLES: scalable distance-based phylogenetic placement with or without alignments. Syst. Biol. 69(3), 566–578 (2020). https://doi.org/10.1093/sysbio/syz063. https://academic.oup.com/sysbio/advance-article/doi/10.1093/sysbio/syz063/5572672. https://academic.oup.com/sysbio/article/69/3/566/5572672

  6. Barbera, P., et al.: EPA-ng: massively parallel evolutionary placement of genetic sequences. Syst. Biol. 68(2), 365–369 (2019). https://doi.org/10.1093/sysbio/syy054. https://academic.oup.com/sysbio/article/68/2/365/5079844

  7. Berger, S.A., Krompass, D., Stamatakis, A.: Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst. Biol. 60(3), 291–302 (2011). https://doi.org/10.1093/sysbio/syr010. http://sysbio.oxfordjournals.org/cgi/content/abstract/60/3/291. http://sysbio.oxfordjournals.org/content/60/3/291.abstract. http://sysbio.oxfordjournals.org/content/60/3/291.full.pdf. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3078422&tool=pmc

  8. Berry, V., Gascuel, O.: On the interpretation of bootstrap trees: appropriate threshold of clade selection and induced gain. Mol. Biol. Evol. 13(7), 999–1011 (1996). https://doi.org/10.1093/molbev/13.7.999. https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/13.7.999

  9. Bohmann, K., Mirarab, S., Bafna, V., Gilbert, M.T.P.: Beyond DNA barcoding: the unrealized potential of genome skim data in sample identification. Mol. Ecol. 29(14), 2521–2534 (2020). https://doi.org/10.1111/mec.15507. https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.15507

  10. Brown, D., Truszkowski, J.: LSHPlace: fast phylogenetic placement using locality-sensitive hashing. In: Pacific Symposium on Biocomputing, pp. 310–319 (2013). https://doi.org/10.1142/9789814447973_0031. http://www.ncbi.nlm.nih.gov/pubmed/23424136. http://www.worldscientific.com/doi/abs/10.1142/9789814447973_0031

  11. Darling, A.E., Jospin, G., Lowe, E., Matsen, F.A., Bik, H.M., Eisen, J.A.: PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2, e243 (2014). https://doi.org/10.7717/peerj.243. https://peerj.com/articles/243

  12. Desper, R., Gascuel, O.: Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J. Comput. Biol. 9(5), 687–705 (2002). https://doi.org/10.1089/106652702761034136. http://www.liebertonline.com/doi/abs/10.1089/106652702761034136. http://www.ncbi.nlm.nih.gov/pubmed/12487758

  13. Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7(1), 1–26 (1979). http://www.jstor.org/stable/2958830

  14. Erdos, P., Steel, M., Szekely, L., Warnow, T.: A few logs suffice to build (almost) all trees: part II. Theoret. Comput. Sci. 221(1–2), 77–118 (1999). https://doi.org/10.1016/S0304-3975(99)00028-6

    Article  MathSciNet  MATH  Google Scholar 

  15. Felsenstein, J.: Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4), 783–791 (1985). https://doi.org/10.2307/2408678. http://www.jstor.org/stable/2408678

  16. Felsenstein, J.: Inferring phylogenies (2003)

    Google Scholar 

  17. Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155(3760), 279–284 (1967). https://doi.org/10.1126/science.155.3760.279. https://www.science.org/doi/10.1126/science.155.3760.279

  18. Guénoche, A., Garreta, H.: Can we have confidence in a tree representation? In: Gascuel, O., Sagot, M.-F. (eds.) JOBIM 2000. LNCS, vol. 2066, pp. 45–56. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45727-5_5

    Chapter  Google Scholar 

  19. Guo, S., Wang, L.S., Kim, J.: Large-scale simulation of RNA macroevolution by an energy-dependent fitness model. arXiv 0912.2326 (2009). http://arxiv.org/abs/0912.2326

  20. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89(22), 10915–10919 (1992). https://doi.org/10.1073/pnas.89.22.10915. http://www.pnas.org/cgi/doi/10.1073/pnas.89.22.10915

  21. Huson, D.H., Nettles, S.M., Warnow, T.J.: Disk-covering, a fast-converging method for phylogenetic tree reconstruction. J. Comput. Biol. 6(3–4), 369–386 (1999). https://doi.org/10.1089/106652799318337. http://www.ncbi.nlm.nih.gov/pubmed/10582573

  22. Janssen, S., et al.: Phylogenetic placement of exact amplicon sequences improves associations with clinical information. mSystems 3(3), 00021-18 (2018). https://doi.org/10.1128/mSystems.00021-18. http://msystems.asm.org/lookup/doi/10.1128/mSystems.00021-18

  23. Jarvis, E.D., et al.: Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346(6215), 1320–1331 (2014). https://doi.org/10.1126/science.1253451. http://www.sciencemag.org/content/346/6215/1320.abstract. http://www.sciencemag.org/cgi/doi/10.1126/science.1253451

  24. Jiang, Y., Balaban, M., Zhu, Q., Mirarab, S.: DEPP: deep learning enables extending species trees using single genes. bioRxiv (abstract in RECOMB 2021) (2021). https://doi.org/10.1101/2021.01.22.427808. http://biorxiv.org/content/early/2021/01/24/2021.01.22.427808.abstract

  25. Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Mammalian Protein Metabolism, vol. III, pp. 21–132 (1969)

    Google Scholar 

  26. Kishino, H., Hasegawa, M.: Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J. Mol. Evol. 29(2), 170–179 (1989). https://doi.org/10.1007/BF02100115. http://www.springerlink.com/content/ll0lr02023152485

  27. Kubatko, L.S., Degnan, J.H.: Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst. Biol. 56, 17–24 (2007). http://sysbio.oxfordjournals.org/content/56/1/17.short

  28. Libin, P., et al.: PhyloGeoTool: interactively exploring large phylogenies in an epidemiological context. Bioinformatics 33(24), 3993–3995 (2017). https://doi.org/10.1093/bioinformatics/btx535

    Article  Google Scholar 

  29. Linard, B., Swenson, K.M., Pardi, F.: Rapid alignment-free phylogenetic identification of metagenomic sequences. Bioinformatics 35(18), 3303–3312 (2019). https://doi.org/10.1093/bioinformatics/btz068. https://doi.org/10.1093/bioinformatics/btz068

  30. Mai, U., Mirarab, S.: Completing gene trees without species trees in sub-quadratic time. Bioinformatics btab875 (2022). https://doi.org/10.1093/bioinformatics/btab875. https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab875/6493250

  31. Matsen, F.A.: Phylogenetics and the human microbiome. Syst. Biol. 64(1), e26–e41 (2015). https://doi.org/10.1093/sysbio/syu053. http://arxiv.org/abs/1407.1794. https://academic.oup.com/sysbio/article/64/1/e26/2847641

  32. Matsen, F.A., Kodner, R.B., Armbrust, E.V.: pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform. 11(1), 538 (2010). https://doi.org/10.1186/1471-2105-11-538. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3098090&tool=pmcentrez&rendertype=abstract. http://www.ncbi.nlm.nih.gov/pubmed/21034504. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC3098090

  33. Matsen, F.A., IV., Evans, S.N., Matsen, F.A., Evans, S.N.: Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison. PLoS ONE 8(3), 1–17 (2013). https://doi.org/10.1371/journal.pone.0056859

    Article  Google Scholar 

  34. McDonald, D., Birmingham, A., Knight, R.: Context and the human microbiome. Microbiome 3(1), 52 (2015). https://doi.org/10.1186/s40168-015-0117-2. http://www.microbiomejournal.com/content/3/1/52

  35. Mirarab, S., Bayzid, M.S., Warnow, T.: Evaluating summary methods for multilocus species tree estimation in the presence of incomplete lineage sorting. Syst. Biol. 65(3), 366–380 (2016). https://doi.org/10.1093/sysbio/syu063. http://sysbio.oxfordjournals.org/content/early/2014/10/13/sysbio.syu063%5Cn. http://sysbio.oxfordjournals.org/content/early/2014/10/13/sysbio.syu063.abstract%5Cn. http://sysbio.oxfordjournals.org/content/early/2014/10/13/sysbio.syu063.full.pdf%5Cn

  36. Mirarab, S., Nguyen, N., Warnow, T.: SEPP: SATé-enabled phylogenetic placement. In: Pacific Symposium on Biocomputing, pp. 247–258. World Scientific (2012). https://doi.org/10.1142/9789814366496_0024. http://www.ncbi.nlm.nih.gov/pubmed/22174280. http://www.worldscientific.com/doi/abs/10.1142/9789814366496_0024

  37. Nayfach, S., Shi, Z.J., Seshadri, R., Pollard, K.S., Kyrpides, N.C.: New insights from uncultivated genomes of the global human gut microbiome. Nature 568(7753), 505–510 (2019). https://doi.org/10.1038/s41586-019-1058-x. http://www.nature.com/articles/s41586-019-1058-x

  38. Nguyen, N.P., Mirarab, S., Liu, B., Pop, M., Warnow, T.: TIPP: taxonomic identification and phylogenetic profiling. Bioinformatics 30(24), 3548–3555 (2014). https://doi.org/10.1093/bioinformatics/btu721. http://bioinformatics.oxfordjournals.org/cgi/doi/10.1093/bioinformatics/btu721. https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btu721

  39. Pasolli, E., et al.: Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176(3), 649–662 (2019). https://doi.org/10.1016/j.cell.2019.01.001. https://linkinghub.elsevier.com/retrieve/pii/S0092867419300017

  40. Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree-2 - approximately maximum-likelihood trees for large alignments. PLoS One 5(3), e9490 (2010). https://doi.org/10.1371/journal.pone.0009490. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2835736&tool=pmcentrez&rendertype=abstract

  41. Rabiee, M., Mirarab, S.: INSTRAL: discordance-aware phylogenetic placement using quartet scores. Syst. Biol. 69(2), 384–391 (2020). https://doi.org/10.1093/sysbio/syz045. https://academic.oup.com/sysbio/advance-article/doi/10.1093/sysbio/syz045/5530610

  42. Roch, S., Steel, M.: Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor. Popul. Biol. 100, 56–62 (2015). https://doi.org/10.1016/j.tpb.2014.12.005. http://www.sciencedirect.com/science/article/pii/S0040580914001075. https://linkinghub.elsevier.com/retrieve/pii/S0040580914001075

  43. Salichos, L., Rokas, A.: Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497(7449), 327–331 (2013). https://doi.org/10.1038/nature12130. http://www.nature.com/nature/journal/vaop/ncurrent/full/nature12130.html

  44. Sayyari, E., Mirarab, S.: Fast coalescent-based computation of local branch support from quartet frequencies. Mol. Biol. Evol. 33(7), 1654–1668 (2016). https://doi.org/10.1093/molbev/msw079. https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msw079

  45. Singh, K.: On the asymptotic accuracy of Efron’s bootstrap. Ann. Stat. 9(6), 1187–1195 (1981)

    Article  MathSciNet  Google Scholar 

  46. Soltis, P.S., Soltis, D.E.: Applying the bootstrap in phylogeny reconstruction. Stat. Sci. 18(2), 256–267 (2003). http://www.jstor.org/stable/3182855

  47. Sonnhammer, E.L., Hollich, V.: Scoredist: a simple and robust protein sequence distance estimator. BMC Bioinform. 6, 1–8 (2005). https://doi.org/10.1186/1471-2105-6-108

    Article  Google Scholar 

  48. Stark, M., Berger, S.A., Stamatakis, A., von Mering, C.: MLTreeMap-accurate maximum likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genomics 11(1), 461 (2010). https://doi.org/10.1186/1471-2164-11-461. http://www.biomedcentral.com/1471-2164/11/461

  49. Thompson, L.R., et al.: A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551(7681), 457–463 (2017). https://doi.org/10.1038/nature24621. http://www.nature.com/doifinder/10.1038/nature24621

  50. Turakhia, Y., et al.: Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic. Nature Genet. 53(6), 809–816 (2021). https://doi.org/10.1038/s41588-021-00862-7. http://www.nature.com/articles/s41588-021-00862-7

  51. Warnow, T., Moret, B.M.E., John, K.S.: Absolute convergence: true trees from short sequences. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (2001)

    Google Scholar 

  52. Wedell, E., Cai, Y., Warnow, T.: Scalable and accurate phylogenetic placement using pplacer-XR. In: Martín-Vide, C., Vega-Rodríguez, M.A., Wheeler, T. (eds.) AlCoB 2021. LNCS, vol. 12715, pp. 94–105. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-74432-8_7

    Chapter  Google Scholar 

  53. Zhang, C., Rabiee, M., Sayyari, E., Mirarab, S.: ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinform. 19(S6), 153 (2018). https://doi.org/10.1186/s12859-018-2129-y. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2129-y

  54. Zheng, Q., Bartow-McKenney, C., Meisel, J.S., Grice, E.A.: HmmUFOtu: an HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome amplicon sequencing studies. Genome Biol. 19(1), 82 (2018). https://doi.org/10.1186/s13059-018-1450-0. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1450-0

  55. Zhu, Q., et al.: Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat. Commun. 10(1), 5477 (2019). https://doi.org/10.1038/s41467-019-13443-4. http://www.nature.com/articles/s41467-019-13443-4

  56. Zhu, Q., et al.: WoL: reference phylogeny for microbes (data pre-release) (2019). https://biocore.github.io/wol/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Shamsuzzoha Bayzid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hasan, N.B., Biswas, A., Balaban, M., Mirarab, S., Bayzid, M.S. (2022). Fast and Accurate Branch Support Calculation for Distance-Based Phylogenetic Placements. In: Jin, L., Durand, D. (eds) Comparative Genomics. RECOMB-CG 2022. Lecture Notes in Computer Science(), vol 13234. Springer, Cham. https://doi.org/10.1007/978-3-031-06220-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06220-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06219-3

  • Online ISBN: 978-3-031-06220-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics