Abstract
We propose a novel summary based method to infer species trees from input multi-locus gene trees with incomplete lineage sorting (ILS). The method extends an existing technique called STAR [13], which defines average coalescence rank between taxa pairs (couplets), to derive species trees using Neighbor-Joining (NJ) [20, 23]. Such coalescence rank, however, is ambiguous at couplet level. We propose two new couplet based distance measures, termed as accumulated coalescence rank (AcR), and excess gene tree leaves (XL), and show that their combination discriminates individual couplets better. We propose a new method AcRNJXL, which uses the proposed measures, for NJ based species tree construction. Results show that for biological datasets, AcRNJXL produces much better performance than STAR and other reference approaches, with the same time and space complexities as STAR.
References
Chaudhary, R., Bansal, M.S., Wehe, A., Fernández-Baca, D., Eulenstein, O.: iGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinform. 23(574), 1–7 (2010)
Chaudhary, R., Burleigh, J.G., Fernández-Baca, D.: Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance. Algorithms Mol. Biol. 8:28(1), 1–12 (2013)
Chiari, Y., Cahais, V., Galtier, N., Delsuc, F.: Phylogenomic analyses support the position of turtles as the sister group of birds and crocodiles (archosauria). BMC Biol. 10(65), 1–14 (2012)
DeGiorgio, M., Degnan, J.: Robustness to divergence time underestimation when inferring species trees from estimated gene trees. Syst. Biol. 63(1), 66–82 (2014)
Degnan, J.H., Rosenberg, N.A.: Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24(6), 332–340 (2009)
Heled, J., Drummond, A.J.: Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27(3), 570–580 (2010)
Kubatko, L.S., Carstens, B.C., Knowles, L.: Stem: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7), 971–973 (2009)
Larget, B.R., Kotha, S.K., Dewey, C.N., Ané, C.: Bucky: Gene tree/species tree reconciliation with bayesian concordance analysis. Bioinformatics 26(22), 2910–2911 (2010)
Liu, L.: Best: bayesian estimation of species trees under the coalescent model. Bioinformatics 24(21), 2542–2543 (2008)
Liu, L., Xi, Z., Wu, S., Davis, C.C., Edwards, S.V.: Estimating phylogenetic trees from genome-scale data. Ann. N. Y. Acad. Sci. 1360(1), 36–53 (2015)
Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60(5), 661–667 (2011)
Liu, L., Yu, L., Edwards, S.V.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10(302), 1–18 (2010)
Liu, L., Yu, L., Pearl, D.K., Edwards, S.V.: Estimating species phylogenies using coalescence times among sequences. Syst. Biol. 58(5), 468–477 (2009)
Mirarab, S., Bayzid, M.S., Warnow, T.: Evaluating summary methods for multilocus species tree estimation in the presence of incomplete lineage sorting. Syst. Biol. p. syu063 (2014). doi:10.1093/sysbio/syu063
Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M.S., Warnow, T.: Astral: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)
Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estim-ation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)
Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans. Comput. Biol. Bioinform. 7(1), 166–171 (2010)
Nakhleh, L.: Computational approaches to species phylogeny inference and gene tree reconciliation. Trends Ecol. Evol. 28(12), 719–728 (2013)
Roch, S., Steel, M.: Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor. Population Biol. 100, 56–62 (2015)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)
Song, S., Liu, L., Edwards, S.V., Wu, S.: Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc. Nat. Acad. Sci. USA 109(37), 14942–14947 (2012)
Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)
Studier, J.A., Keppler, K.L.: A note on the neighbor-joining algorithm of saitou and nei. Mol. Biol. Evol. 5(6), 729–731 (1988)
Sukumaran, J., Holder, M.T.: DendroPy: a python library for phylogenetic computing. Bioinformatics 26(12), 1569–1571 (2000)
Than, C., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLOS Comput. Biol. 5(9), 1–12 (2009)
Wickett, N.J., et al.: Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Nat. Acad. Sci. USA 111(45), E4859–E4868 (2014)
Xi, Z., Liu, L., Rest, J.S., Davis, C.C.: Coalescent versus concatenation methods and the placement of amborella as sister to water lilies. Syst. Biol. 63(6), 919–932 (2014)
Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18(11), 1543–1559 (2011)
Acknowledgments
The first author acknowledges Tata Consultancy Services (TCS) for providing the research scholarship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Bhattacharyya, S., Mukhopadhyay, J. (2016). Accumulated Coalescence Rank and Excess Gene Count for Species Tree Inference. In: Botón-Fernández, M., Martín-Vide, C., Santander-Jiménez, S., Vega-Rodríguez, M.A. (eds) Algorithms for Computational Biology. AlCoB 2016. Lecture Notes in Computer Science(), vol 9702. Springer, Cham. https://doi.org/10.1007/978-3-319-38827-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-38827-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38826-7
Online ISBN: 978-3-319-38827-4
eBook Packages: Computer ScienceComputer Science (R0)