Skip to main content

Scalable and Accurate Phylogenetic Placement Using pplacer-XR

  • Conference paper
  • First Online:
Algorithms for Computational Biology (AlCoB 2021)

Abstract

Phylogenetic placement, the problem of placing a sequence into a precomputed phylogenetic “backbone” tree, is useful for constructing large trees, performing taxon identification of newly obtained sequences, and other applications. The most accurate current method, pplacer, performs the placement using maximum likelihood but fails frequently on backbone trees with 5000 sequences. We show a simple technique, pplacer-XR (pplacer-eXtra Range), that extends pplacer to large datasets. We show, using challenging large datasets, that pplacer-XR provides the accuracy of pplacer and the scalability to ultra-large datasets of a leading fast phylogenetic placmement method, APPLES. pplacer-XR is available in open source form on github.

Y. Cai and E. Wedell—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Balaban, M., Roush, D., Zhu, Q., Mirarab, S.: APPLES-2: faster and more accurate distance-based phylogenetic placement using divide and conquer. bioRxiv (2021). https://doi.org/10.1101/2021.02.14.431150

  2. Balaban, M., Sarmashghi, S., Mirarab, S.: APPLES: scalable distance-based phylogenetic placement with or without alignments. Syst. Biol. 69(3), 566–578 (2020)

    Article  Google Scholar 

  3. Barbera, P., et al.: EPA-NG: massively parallel evolutionary placement of genetic sequences. Syst. Biol. 68(2), 365–369 (2019)

    Article  Google Scholar 

  4. Berger, S.A., Krompass, D., Stamatakis, A.: Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst. Biol. 60(3), 291–302 (2011)

    Article  Google Scholar 

  5. Bik, H.M., Porazinska, D.L., Creer, S., Caporaso, J.G., Knight, R., Thomas, W.K.: Sequencing our way towards understanding global eukaryotic biodiversity. Trends Ecol. Evol. 27(4), 233–243 (2012)

    Article  Google Scholar 

  6. Chaumeil, P.A., Mussig, A.J., Hugenholtz, P., Parks, D.H.: GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 36(6), 1925–1927 (2020)

    Google Scholar 

  7. Conlan, S., Kong, H.H., Segre, J.A.: Species-level analysis of DNA sequence data from the NIH Human Microbiome Project. PLoS ONE 7(10), e47075 (2012)

    Article  Google Scholar 

  8. Matsen, F.A., Kodner, R.B., Armbrust, E.V.: pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform. 11(1), 538 (2010)

    Article  Google Scholar 

  9. McCoy, C.O., Matsen IV, F.A.: Abundance-weighted phylogenetic diversity measures distinguish microbial community states and are robust to sampling depth. PeerJ 1, e157 (2013)

    Article  Google Scholar 

  10. Mirarab, S., Nguyen, N., Guo, S., Wang, L.S., Kim, J., Warnow, T.: PASTA: ultra-large multiple sequence alignment for nucleotide and amino-acid sequences. J. Comput. Biol. 22(5), 377–386 (2015)

    Article  Google Scholar 

  11. Mirarab, S., Nguyen, N., Warnow, T.: SEPP: SATé-enabled phylogenetic placement. In: Biocomputing 2012, pp. 247–258. World Scientific (2012)

    Google Scholar 

  12. Nguyen, N.P., Mirarab, S., Liu, B., Pop, M., Warnow, T.: TIPP: taxonomic identification and phylogenetic profiling. Bioinformatics 30(24), 3548–3555 (2014)

    Article  Google Scholar 

  13. Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree 2-approximately maximum-likelihood trees for large alignments. PLoS ONE 5(3), e9490 (2010)

    Article  Google Scholar 

  14. Shah, N., Molloy, E.K., Pop, M., Warnow, T.: TIPP2: metagenomic taxonomic profiling using phylogenetic markers. Bioinformatics (2021)

    Google Scholar 

  15. Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)

    Article  Google Scholar 

  16. Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci. 17(2), 57–86 (1986)

    MathSciNet  MATH  Google Scholar 

  17. Yang, Z.: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39(3), 306–314 (1994)

    Article  Google Scholar 

Download references

Acknowledgments

The research presented here is the result of a course project by EW and YC for the Spring 2020 course CS 581: Algorithmic Genomic Biology, at the University of Illinois, taught by TW. This work was supported in part by the National Science Foundation grant ABI-1458652 to TW.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Eleanor Wedell or Tandy Warnow .

Editor information

Editors and Affiliations

Appendix

Appendix

Commands to create backbone trees: All placement methods use the same backbone tree topologies, but have different branch lengths (following the protocol as provided in [2]). We downloaded the backbone trees with their optimized branch lengths for each phylogenetic placement method from the APPLES repository.

On replicates 1 and 2 of the 100,000-leaf backbone condition and replicate 0 of the 200,000-leaf replicate backbone condition, each containing more than two identical sequences, the backbone trees had polytomies. We randomly resolved these in order to run RAxML.

Random tree refinement: raxmlHPC-PTHREADS -f e -t res_true.fasttree -m GTRGAMMA -s aln_dna.phy -n REF -p 1984 -T 16

APPLES command: run_apples.py -t backbone.tree -s ref.fa -q query.fa -T 16 -o apples.jplace

EPA-ng command: epa-ng –ref-msa ref.fa –tree backbone.tree –query query.fa –outdir $query –model RAxML_info.REF8 –redo -T 16

pplacer-XR commands: python3 pplacer-XR.py GTR RAxML_info.REF backbone.tree output_dir aln.fa query.txt 2000

Github site: https://github.com/chry04/pplacer_plusplus

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wedell, E., Cai, Y., Warnow, T. (2021). Scalable and Accurate Phylogenetic Placement Using pplacer-XR. In: Martín-Vide, C., Vega-Rodríguez, M.A., Wheeler, T. (eds) Algorithms for Computational Biology. AlCoB 2021. Lecture Notes in Computer Science(), vol 12715. Springer, Cham. https://doi.org/10.1007/978-3-030-74432-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74432-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74431-1

  • Online ISBN: 978-3-030-74432-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics