Skip to main content

Enhancing Searches for Optimal Trees Using SIESTA

  • Conference paper
  • First Online:
Book cover Comparative Genomics (RECOMB-CG 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10562))

Included in the following conference series:

  • 988 Accesses

Abstract

Many supertree estimation and multi-locus species tree estimation methods compute trees by combining trees on subsets of the species set based on some NP-hard optimization criterion. A recent approach to computing large trees has been to constrain the search space by defining a set of “allowed bipartitions”, and then use dynamic programming to find provably optimal solutions in polynomial time. Several phylogenomic estimation methods, such as ASTRAL, the MDC algorithm in PhyloNet, and FastRFS, use this approach. We present SIESTA, a method that allows the dynamic programming method to return a data structure that compactly represents all the optimal trees in the search space. As a result, SIESTA provides multiple capabilities, including: (1) counting the number of optimal trees, (2) calculating consensus trees, (3) generating a random optimal tree, and (4) annotating branches in a given optimal tree by the proportion of optimal trees it appears in. SIESTA is available in open source form on github at https://github.com/pranjalv123/SIESTA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alvarado-Serrano, D.F., D’Elía, G.: A new genus for the Andean mice Akodon latebricola and A. bogotensis (Rodentia: Sigmodontinae). J. Mammal. 94(5), 995–1015 (2013)

    Article  Google Scholar 

  2. Bayzid, M.S., Mirarab, S., Warnow, T.J.: Inferring optimal species trees under gene duplication and loss. In: Pacific Symposium Biocomputing, vol. 18, pp. 250–261 (2013)

    Google Scholar 

  3. Bininda-Emonds, O.R.: Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, vol. 4. Springer Science & Business Media, Dordrecht (2004). doi:10.1007/978-1-4020-2330-9

    MATH  Google Scholar 

  4. Bryant, D., Steel, M.: Constructing optimal trees from quartets. J. Algorithms 38(1), 237–259 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  5. Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009). http://mbe.oxfordjournals.org/content/26/8/1879.abstract

    Article  Google Scholar 

  6. González-Ittig, R.E., Rivera, P.C., Levis, S.C., Calderón, G.E., Gardenal, C.N.: The molecular phylogenetics of the genus Oligoryzomys (Rodentia: Cricetidae) clarifies rodent host-hantavirus associations. Zool. J. Linn. Soc. 171(2), 457–474 (2014)

    Article  Google Scholar 

  7. Hallett, M.T., Lagergren, J.: New algorithms for the duplication-loss model. In: Proceedings of the Fourth Annual International Conference on Computational Molecular Biology (RECOMB), pp. 138–146. ACM (2000)

    Google Scholar 

  8. Larget, B.R., Kotha, S.K., Dewey, C.N., Ané, C.: BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26(22), 2910–2911 (2010)

    Article  Google Scholar 

  9. Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60(5), 661–667 (2011)

    Article  Google Scholar 

  10. Liu, L., Yu, L., Edwards, S.V.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10(1), 1–18 (2010). doi:10.1186/1471-2148-10-302

    Article  Google Scholar 

  11. Machado, L.F., Leite, Y.L., Christoff, A.U., Giugliano, L.G.: Phylogeny and biogeography of tetralophodont rodents of the tribe Oryzomyini (Cricetidae: Sigmodontinae). Zoolog. Scr. 43(2), 119–130 (2014)

    Article  Google Scholar 

  12. Maddison, W.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997). doi:10.1093/sysbio/46.3.523

    Article  Google Scholar 

  13. Maestri, R., Monteiro, L.R., Fornel, R., Upham, N.S., Patterson, B.D., Freitas, T.R.O.: The ecology of a continental evolutionary radiation: is the radiation of sigmodontine rodents adaptive? Evolution 71(3), 610–632 (2017)

    Article  Google Scholar 

  14. Mallo, D., Martins, L.D.O., Posada, D.: SimPhy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016). doi:10.1093/sysbio/syv082

    Article  Google Scholar 

  15. Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M.S., Warnow, T.: ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)

    Article  Google Scholar 

  16. Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)

    Article  Google Scholar 

  17. Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 7(1), 166–171 (2010)

    Article  Google Scholar 

  18. Nguyen, N., Mirarab, S., Warnow, T.: MRL and SuperFine+MRL: new supertree methods. Algorithms Mol. Biol. 7(1), 3 (2012)

    Article  Google Scholar 

  19. Roch, S.: A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 3(1), 92 (2006)

    Article  Google Scholar 

  20. Ronquist, F., Teslenko, M., Van Der Mark, P., Ayres, D.L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M.A., Huelsenbeck, J.P.: MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61(3), 539–542 (2012)

    Article  Google Scholar 

  21. Sayyari, E., Mirarab, S.: Fast coalescent-based computation of local branch support from quartet frequencies. Mol. Biol. Evol. 33(7), 1654–1668 (2016)

    Article  Google Scholar 

  22. Sharanowski, B.J., Robbertse, B., Walker, J., Voss, S.R., Yoder, R., Spatafora, J., Sharkey, M.J.: Expressed sequence tags reveal Proctotrupomorpha (minus Chalcidoidea) as sister to Aculeata (Hymenoptera: Insecta). Mol. Phylogenet. Evol. 57(1), 101–112 (2010)

    Article  Google Scholar 

  23. Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9) (2014). doi:10.1093/bioinformatics/btu033

  24. Sukumaran, J., Holder, M.T.: Dendropy: a python library for phylogenetic computing. Bioinformatics 26(12), 1569–1571 (2010)

    Article  Google Scholar 

  25. Swenson, M.S., Barbançon, F., Warnow, T., Linder, C.R.: A simulation study comparing supertree and combined analysis methods using SMIDGen. Algorithms Mol. Biol. 5, 8 (2010)

    Article  Google Scholar 

  26. Szöllősi, G.J., Rosikiewicz, W., Boussau, B., Tannier, E., Daubin, V.: Efficient exploration of the space of reconciled gene trees. Syst. Biol. 62, 901–912 (2013)

    Article  Google Scholar 

  27. Than, C., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLoS Comput. Biol. 5(9), e1000501 (2009). doi:10.1371/journal.pcbi.1000501.g016

    Article  MathSciNet  Google Scholar 

  28. Vachaspati, P.: Simulated data for siesta paper (2017). doi:10.6084/m9.figshare.5234803.v1. Accessed 21 July 2017

  29. Vachaspati, P., Warnow, T.: ASTRID: accurate species TRees from internode distances. BMC Genom. 16(10), 1–13 (2015). doi:10.1186/1471-2164-16-S10-S3

    Google Scholar 

  30. Vachaspati, P., Warnow, T.: FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization. Bioinformatics 33(5), 631–639 (2017)

    Google Scholar 

  31. Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18(11), 1543–1559 (2011)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

We thank the anonymous reviewers for their helpful criticisms on an earlier draft, which greatly improved the manuscript. We also thank Erin Molloy, Sarah Christensen, and Siavash Mirarab, for feedback on the initial results.

Funding. This study made use of the Illinois Campus Cluster, a computing resource that is operated by the Illinois Campus Cluster Program in conjunction with the National Center for Supercomputing Applications and which is supported by funds from the University of Illinois at Urbana-Champaign. This work was partially supported by U.S. National Science Foundation Graduate Research Fellowship Program under Grant Number DGE-1144245 to PV and U.S. National Science Foundation grant CCF-1535977 to TW.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Supplementary Materials

Supplementary Materials

Table 1. We show the mean number of optimal trees for ASTRAL, averaged over 25 replicates of 50-taxon simulated datasets with 5 genes that vary in the level of missing data. AD12 is moderate ILS, AD31 is high ILS, and AD68 is very high ILS.
Table 2. We show the mean number of optimal trees for ASTRAL, averaged over 10 replicates of 50-taxon simulated datasets with 10 genes that vary in the level of missing data. AD12 is moderate ILS, AD31 is high ILS, and AD68 is very high ILS.
Table 3. We show the mean number of optimal trees for ASTRAL, averaged over 10 replicates of 50-taxon simulated datasets with 25 genes that vary in the level of missing data. AD12 is moderate ILS, AD31 is high ILS, and AD68 is very high ILS.
Table 4. Number of optimal trees (in scientific notation) for ASTRAL, FastRFS-basic, and FastRFS-enhanced on SMIDgen simulated supertree data sets with varying numbers of taxa and genes, and differing scaffold factors. ASTRAL has several orders of magnitude fewer optimal trees than FastRFS-basic and FastRFS-enhanced.
Fig. 7.
figure 7

The strict consensus of FastRFS trees is more accurate than FastRFS. We show Delta-error (change in mean topological error between FastRFS and the strict consensus of FastRFS trees) on simulated supertree datasets with 100, 500, and 1000 species; values below 0 indicate that the strict consensus FastRFS is more accurate (i.e., it has lower error) than FastRFS. The figure shows how the percentage of taxa in the scaffold source tree impact accuracy, averaged over 10 replicates for 1000-taxon data and 25 replicates for 100- and 500-taxon data. Error bars indicate the standard error; the topological error is the average of the FN and FP error rates.

Fig. 8.
figure 8

Mean error rates for a single FastRFS tree and the strict consensus of all FastRFS trees on the supertree datasets with 100, 500, and 1000 species, compared to the number of optimal trees. We show FP rate and FN rates for each method; these are equal for default FastRFS (because it is always binary), but different for the strict consensus of the FastRFS trees. As the number of optimal trees increases, the decrease in the FP rate is larger than the increase in the FN rate for the strict consensus of the FastRFS trees, explaining why the average error for the strict consensus of the FastRFS trees is lower than for a single FastRFS tree (as shown in Fig. 7). Results for 193 replicates are shown on 1000-taxon data, results for 312 replicates are shown on 500-taxon data, and results for 104 replicates are shown on 100-taxon data.

Fig. 9.
figure 9

The strict consensus of ASTRAL trees is more accurate than ASTRAL when gene trees are incomplete. We show Delta-error (change in mean topological error between FastRFS and the strict consensus of FastRFS trees) on simulated phylogenomic datasets with varying numbers of incomplete gene trees on 50-species datasets with three different ILS levels; values below 0 indicate that the strict consensus ASTRAL is more accurate (i.e., it has lower error) than ASTRAL. Note that there is a big advantage in computing the strict consensus tree of the optimal ASTRAL trees instead of a single ASTRAL tree under the highest amount of missing data, and that the advantage decreases as the amount of missing data decreases. We show results for 25 replicates. Error bars indicate the standard error; topological error is the average of the FN and FP error rates.

Fig. 10.
figure 10

Mean change in error between the strict consensus of the ASTRAL trees compared to a single ASTRAL tree on the 50-taxon phylogenomic datasets with varying degrees of missing data and ILS, as a function of the number of optimal trees. Values below zero indicate that the strict consensus tree has better accuracy (lower error) than a single ASTRAL tree. We show the change in FP rates (blue, solid line) and in FN rates (red, dashed); the black line represents the baseline. This figure shows that the strict consensus has lower false positives than a single ASTRAL tree and higher false negatives, but also that the reduction in false positives is larger than the increase in false negatives. The figure also shows that the reduction in false positives increases with the number of optimal trees. (Color figure online)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Vachaspati, P., Warnow, T. (2017). Enhancing Searches for Optimal Trees Using SIESTA. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67979-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67978-5

  • Online ISBN: 978-3-319-67979-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics