Enhancing Searches for Optimal Trees Using SIESTA

Vachaspati, Pranjal; Warnow, Tandy

doi:10.1007/978-3-319-67979-2_13

Pranjal Vachaspati¹⁵ &
Tandy Warnow¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10562))

Included in the following conference series:

RECOMB International Workshop on Comparative Genomics

988 Accesses

Abstract

Many supertree estimation and multi-locus species tree estimation methods compute trees by combining trees on subsets of the species set based on some NP-hard optimization criterion. A recent approach to computing large trees has been to constrain the search space by defining a set of “allowed bipartitions”, and then use dynamic programming to find provably optimal solutions in polynomial time. Several phylogenomic estimation methods, such as ASTRAL, the MDC algorithm in PhyloNet, and FastRFS, use this approach. We present SIESTA, a method that allows the dynamic programming method to return a data structure that compactly represents all the optimal trees in the search space. As a result, SIESTA provides multiple capabilities, including: (1) counting the number of optimal trees, (2) calculating consensus trees, (3) generating a random optimal tree, and (4) annotating branches in a given optimal tree by the proportion of optimal trees it appears in. SIESTA is available in open source form on github at https://github.com/pranjalv123/SIESTA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alvarado-Serrano, D.F., D’Elía, G.: A new genus for the Andean mice Akodon latebricola and A. bogotensis (Rodentia: Sigmodontinae). J. Mammal. 94(5), 995–1015 (2013)
Article Google Scholar
Bayzid, M.S., Mirarab, S., Warnow, T.J.: Inferring optimal species trees under gene duplication and loss. In: Pacific Symposium Biocomputing, vol. 18, pp. 250–261 (2013)
Google Scholar
Bininda-Emonds, O.R.: Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, vol. 4. Springer Science & Business Media, Dordrecht (2004). doi:10.1007/978-1-4020-2330-9
MATH Google Scholar
Bryant, D., Steel, M.: Constructing optimal trees from quartets. J. Algorithms 38(1), 237–259 (2001)
Article MathSciNet MATH Google Scholar
Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009). http://mbe.oxfordjournals.org/content/26/8/1879.abstract
Article Google Scholar
González-Ittig, R.E., Rivera, P.C., Levis, S.C., Calderón, G.E., Gardenal, C.N.: The molecular phylogenetics of the genus Oligoryzomys (Rodentia: Cricetidae) clarifies rodent host-hantavirus associations. Zool. J. Linn. Soc. 171(2), 457–474 (2014)
Article Google Scholar
Hallett, M.T., Lagergren, J.: New algorithms for the duplication-loss model. In: Proceedings of the Fourth Annual International Conference on Computational Molecular Biology (RECOMB), pp. 138–146. ACM (2000)
Google Scholar
Larget, B.R., Kotha, S.K., Dewey, C.N., Ané, C.: BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26(22), 2910–2911 (2010)
Article Google Scholar
Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60(5), 661–667 (2011)
Article Google Scholar
Liu, L., Yu, L., Edwards, S.V.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10(1), 1–18 (2010). doi:10.1186/1471-2148-10-302
Article Google Scholar
Machado, L.F., Leite, Y.L., Christoff, A.U., Giugliano, L.G.: Phylogeny and biogeography of tetralophodont rodents of the tribe Oryzomyini (Cricetidae: Sigmodontinae). Zoolog. Scr. 43(2), 119–130 (2014)
Article Google Scholar
Maddison, W.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997). doi:10.1093/sysbio/46.3.523
Article Google Scholar
Maestri, R., Monteiro, L.R., Fornel, R., Upham, N.S., Patterson, B.D., Freitas, T.R.O.: The ecology of a continental evolutionary radiation: is the radiation of sigmodontine rodents adaptive? Evolution 71(3), 610–632 (2017)
Article Google Scholar
Mallo, D., Martins, L.D.O., Posada, D.: SimPhy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016). doi:10.1093/sysbio/syv082
Article Google Scholar
Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M.S., Warnow, T.: ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)
Article Google Scholar
Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)
Article Google Scholar
Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 7(1), 166–171 (2010)
Article Google Scholar
Nguyen, N., Mirarab, S., Warnow, T.: MRL and SuperFine+MRL: new supertree methods. Algorithms Mol. Biol. 7(1), 3 (2012)
Article Google Scholar
Roch, S.: A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 3(1), 92 (2006)
Article Google Scholar
Ronquist, F., Teslenko, M., Van Der Mark, P., Ayres, D.L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M.A., Huelsenbeck, J.P.: MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61(3), 539–542 (2012)
Article Google Scholar
Sayyari, E., Mirarab, S.: Fast coalescent-based computation of local branch support from quartet frequencies. Mol. Biol. Evol. 33(7), 1654–1668 (2016)
Article Google Scholar
Sharanowski, B.J., Robbertse, B., Walker, J., Voss, S.R., Yoder, R., Spatafora, J., Sharkey, M.J.: Expressed sequence tags reveal Proctotrupomorpha (minus Chalcidoidea) as sister to Aculeata (Hymenoptera: Insecta). Mol. Phylogenet. Evol. 57(1), 101–112 (2010)
Article Google Scholar
Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9) (2014). doi:10.1093/bioinformatics/btu033
Sukumaran, J., Holder, M.T.: Dendropy: a python library for phylogenetic computing. Bioinformatics 26(12), 1569–1571 (2010)
Article Google Scholar
Swenson, M.S., Barbançon, F., Warnow, T., Linder, C.R.: A simulation study comparing supertree and combined analysis methods using SMIDGen. Algorithms Mol. Biol. 5, 8 (2010)
Article Google Scholar
Szöllősi, G.J., Rosikiewicz, W., Boussau, B., Tannier, E., Daubin, V.: Efficient exploration of the space of reconciled gene trees. Syst. Biol. 62, 901–912 (2013)
Article Google Scholar
Than, C., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLoS Comput. Biol. 5(9), e1000501 (2009). doi:10.1371/journal.pcbi.1000501.g016
Article MathSciNet Google Scholar
Vachaspati, P.: Simulated data for siesta paper (2017). doi:10.6084/m9.figshare.5234803.v1. Accessed 21 July 2017
Vachaspati, P., Warnow, T.: ASTRID: accurate species TRees from internode distances. BMC Genom. 16(10), 1–13 (2015). doi:10.1186/1471-2164-16-S10-S3
Google Scholar
Vachaspati, P., Warnow, T.: FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization. Bioinformatics 33(5), 631–639 (2017)
Google Scholar
Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18(11), 1543–1559 (2011)
Article MathSciNet Google Scholar

Download references

Acknowledgments

We thank the anonymous reviewers for their helpful criticisms on an earlier draft, which greatly improved the manuscript. We also thank Erin Molloy, Sarah Christensen, and Siavash Mirarab, for feedback on the initial results.

Funding. This study made use of the Illinois Campus Cluster, a computing resource that is operated by the Illinois Campus Cluster Program in conjunction with the National Center for Supercomputing Applications and which is supported by funds from the University of Illinois at Urbana-Champaign. This work was partially supported by U.S. National Science Foundation Graduate Research Fellowship Program under Grant Number DGE-1144245 to PV and U.S. National Science Foundation grant CCF-1535977 to TW.

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois, Urbana, IL, 61801, USA
Pranjal Vachaspati & Tandy Warnow

Authors

Pranjal Vachaspati
View author publications
You can also search for this author in PubMed Google Scholar
Tandy Warnow
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Campinas, Campinas, São Paulo, Brazil
Joao Meidanis
Rice University, Houston, Texas, USA
Luay Nakhleh

Supplementary Materials

Table 1. We show the mean number of optimal trees for ASTRAL, averaged over 25 replicates of 50-taxon simulated datasets with 5 genes that vary in the level of missing data. AD12 is moderate ILS, AD31 is high ILS, and AD68 is very high ILS.

Full size table

Table 2. We show the mean number of optimal trees for ASTRAL, averaged over 10 replicates of 50-taxon simulated datasets with 10 genes that vary in the level of missing data. AD12 is moderate ILS, AD31 is high ILS, and AD68 is very high ILS.

Full size table

Table 3. We show the mean number of optimal trees for ASTRAL, averaged over 10 replicates of 50-taxon simulated datasets with 25 genes that vary in the level of missing data. AD12 is moderate ILS, AD31 is high ILS, and AD68 is very high ILS.

Full size table

Table 4. Number of optimal trees (in scientific notation) for ASTRAL, FastRFS-basic, and FastRFS-enhanced on SMIDgen simulated supertree data sets with varying numbers of taxa and genes, and differing scaffold factors. ASTRAL has several orders of magnitude fewer optimal trees than FastRFS-basic and FastRFS-enhanced.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vachaspati, P., Warnow, T. (2017). Enhancing Searches for Optimal Trees Using SIESTA. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-67979-2_13
Published: 15 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67978-5
Online ISBN: 978-3-319-67979-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Searches for Optimal Trees Using SIESTA

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Supplementary Materials

Supplementary Materials

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation