Abstract
Many supertree estimation and multi-locus species tree estimation methods compute trees by combining trees on subsets of the species set based on some NP-hard optimization criterion. A recent approach to computing large trees has been to constrain the search space by defining a set of “allowed bipartitions”, and then use dynamic programming to find provably optimal solutions in polynomial time. Several phylogenomic estimation methods, such as ASTRAL, the MDC algorithm in PhyloNet, and FastRFS, use this approach. We present SIESTA, a method that allows the dynamic programming method to return a data structure that compactly represents all the optimal trees in the search space. As a result, SIESTA provides multiple capabilities, including: (1) counting the number of optimal trees, (2) calculating consensus trees, (3) generating a random optimal tree, and (4) annotating branches in a given optimal tree by the proportion of optimal trees it appears in. SIESTA is available in open source form on github at https://github.com/pranjalv123/SIESTA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alvarado-Serrano, D.F., D’Elía, G.: A new genus for the Andean mice Akodon latebricola and A. bogotensis (Rodentia: Sigmodontinae). J. Mammal. 94(5), 995–1015 (2013)
Bayzid, M.S., Mirarab, S., Warnow, T.J.: Inferring optimal species trees under gene duplication and loss. In: Pacific Symposium Biocomputing, vol. 18, pp. 250–261 (2013)
Bininda-Emonds, O.R.: Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, vol. 4. Springer Science & Business Media, Dordrecht (2004). doi:10.1007/978-1-4020-2330-9
Bryant, D., Steel, M.: Constructing optimal trees from quartets. J. Algorithms 38(1), 237–259 (2001)
Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009). http://mbe.oxfordjournals.org/content/26/8/1879.abstract
González-Ittig, R.E., Rivera, P.C., Levis, S.C., Calderón, G.E., Gardenal, C.N.: The molecular phylogenetics of the genus Oligoryzomys (Rodentia: Cricetidae) clarifies rodent host-hantavirus associations. Zool. J. Linn. Soc. 171(2), 457–474 (2014)
Hallett, M.T., Lagergren, J.: New algorithms for the duplication-loss model. In: Proceedings of the Fourth Annual International Conference on Computational Molecular Biology (RECOMB), pp. 138–146. ACM (2000)
Larget, B.R., Kotha, S.K., Dewey, C.N., Ané, C.: BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26(22), 2910–2911 (2010)
Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60(5), 661–667 (2011)
Liu, L., Yu, L., Edwards, S.V.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10(1), 1–18 (2010). doi:10.1186/1471-2148-10-302
Machado, L.F., Leite, Y.L., Christoff, A.U., Giugliano, L.G.: Phylogeny and biogeography of tetralophodont rodents of the tribe Oryzomyini (Cricetidae: Sigmodontinae). Zoolog. Scr. 43(2), 119–130 (2014)
Maddison, W.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997). doi:10.1093/sysbio/46.3.523
Maestri, R., Monteiro, L.R., Fornel, R., Upham, N.S., Patterson, B.D., Freitas, T.R.O.: The ecology of a continental evolutionary radiation: is the radiation of sigmodontine rodents adaptive? Evolution 71(3), 610–632 (2017)
Mallo, D., Martins, L.D.O., Posada, D.: SimPhy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016). doi:10.1093/sysbio/syv082
Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M.S., Warnow, T.: ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)
Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)
Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 7(1), 166–171 (2010)
Nguyen, N., Mirarab, S., Warnow, T.: MRL and SuperFine+MRL: new supertree methods. Algorithms Mol. Biol. 7(1), 3 (2012)
Roch, S.: A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 3(1), 92 (2006)
Ronquist, F., Teslenko, M., Van Der Mark, P., Ayres, D.L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M.A., Huelsenbeck, J.P.: MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61(3), 539–542 (2012)
Sayyari, E., Mirarab, S.: Fast coalescent-based computation of local branch support from quartet frequencies. Mol. Biol. Evol. 33(7), 1654–1668 (2016)
Sharanowski, B.J., Robbertse, B., Walker, J., Voss, S.R., Yoder, R., Spatafora, J., Sharkey, M.J.: Expressed sequence tags reveal Proctotrupomorpha (minus Chalcidoidea) as sister to Aculeata (Hymenoptera: Insecta). Mol. Phylogenet. Evol. 57(1), 101–112 (2010)
Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9) (2014). doi:10.1093/bioinformatics/btu033
Sukumaran, J., Holder, M.T.: Dendropy: a python library for phylogenetic computing. Bioinformatics 26(12), 1569–1571 (2010)
Swenson, M.S., Barbançon, F., Warnow, T., Linder, C.R.: A simulation study comparing supertree and combined analysis methods using SMIDGen. Algorithms Mol. Biol. 5, 8 (2010)
Szöllősi, G.J., Rosikiewicz, W., Boussau, B., Tannier, E., Daubin, V.: Efficient exploration of the space of reconciled gene trees. Syst. Biol. 62, 901–912 (2013)
Than, C., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLoS Comput. Biol. 5(9), e1000501 (2009). doi:10.1371/journal.pcbi.1000501.g016
Vachaspati, P.: Simulated data for siesta paper (2017). doi:10.6084/m9.figshare.5234803.v1. Accessed 21 July 2017
Vachaspati, P., Warnow, T.: ASTRID: accurate species TRees from internode distances. BMC Genom. 16(10), 1–13 (2015). doi:10.1186/1471-2164-16-S10-S3
Vachaspati, P., Warnow, T.: FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization. Bioinformatics 33(5), 631–639 (2017)
Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18(11), 1543–1559 (2011)
Acknowledgments
We thank the anonymous reviewers for their helpful criticisms on an earlier draft, which greatly improved the manuscript. We also thank Erin Molloy, Sarah Christensen, and Siavash Mirarab, for feedback on the initial results.
Funding. This study made use of the Illinois Campus Cluster, a computing resource that is operated by the Illinois Campus Cluster Program in conjunction with the National Center for Supercomputing Applications and which is supported by funds from the University of Illinois at Urbana-Champaign. This work was partially supported by U.S. National Science Foundation Graduate Research Fellowship Program under Grant Number DGE-1144245 to PV and U.S. National Science Foundation grant CCF-1535977 to TW.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Supplementary Materials
Supplementary Materials
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Vachaspati, P., Warnow, T. (2017). Enhancing Searches for Optimal Trees Using SIESTA. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-67979-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67978-5
Online ISBN: 978-3-319-67979-2
eBook Packages: Computer ScienceComputer Science (R0)