Abstract
Assessment of microbial biodiversity is typically made by sequencing either PCR-amplified marker genes or all genomic DNA from environmental samples. Both approaches rely on the similarity of the sequenced material to known entries in sequence databases. However, amplicons of non-marker genes are often used, when the research question aims at assessing both functional capabilities of a microbial community and its biodiversity. In such cases, a phylogenetic tree is constructed with known and metagenomic sequences, and expert assessment defines the taxonomic groups the amplicons belong to. Here, instead of relying on sequences, often missing, of non-marker genes, we use tree reconciliation to obtain a distribution of mappings between genes and species. We describe efficient algorithms for the reconstruction of gene-species mappings and a Monte-Carlo method for the inference of distributions for the cases when the number of optimal reconstructions is large. We provide a comparative study of different cost functions showing that the duplication-loss cost induces mappings of the highest quality. Further, we demonstrate the correctness of our approach using several datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 3389–3402 (1997)
Arvestad, L., Lagergren, J., Sennblad, B.: The gene evolution model and computing its associated probabilities. Journal of ACM 56(2) (2009)
Bafna, V., Hannenhalli, S., Rice, K., Vawter, L.: Ligand-Receptor pairing via tree comparison. Journal of Computational Biology 7, 59–70 (2000)
Berglund-Sonnhammer, A.-C., Steffansson, P., Betts, M.J., Liberles, D.A.: Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. Journal of Molecular Evolution 63(2), 240–250 (2006)
Bonizzoni, P., Vedova, G.D., Dondi, R.: Reconciling a gene tree to a species tree under the duplication cost model. Theoretical Computer Science 347(1-2), 36–53 (2005), doi:10.1016/j.tcs.2005.05.016
Dinsdale, E.A., et al.: Functional metagenomic profiling of nine biomes. Nature 452(7187), 629–632 (2008)
Doyon, J.-P., Chauve, C., Hamel, S.: Space of gene/species tree reconciliations and parsimonious models. Journal of Computational Biology 16 (2009)
Durand, D., Halldórsson, B.V., Vernot, B.: A hybrid micro-macroevolutionary approach to gene tree reconstruction. Journal of Computational Biology 13(2), 320–335 (2006)
Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28(2), 132–163 (1979)
Górecki, P., Eulenstein, O., Tiuryn, J.: Unrooted tree reconciliation: A unified approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics 10(2), 522–536 (2013)
Górecki, P., Tiuryn, J.: DLS-trees: A model of evolutionary scenarios. Theoretical Computer Science 359(1-3), 378–399 (2006)
Hallett, M.T., Lagergren, J.: Efficient algorithms for lateral gene transfer problems. In: RECOMB, pp. 149–156 (2001)
Harding, E.F.: The probabilities of rooted tree-shapes generated by random bifurcation. Advances in Applied Probability 3(1), 44–77 (1971)
Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: MEGAN analysis of metagenomic data. Genome Research 17(3), 377–386 (2007)
Lafond, M., Swenson, K.M., El-Mabrouk, N.: An optimal reconciliation algorithm for gene trees with polytomies. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 106–122. Springer, Heidelberg (2012)
Luton, P.E., Wayne, J.M., Sharp, R.J., Riley, P.W.: The mcrA gene as an alternative to 16S rRNA in the phylogenetic analysis of methanogen populations in landfill. Microbiology 148(11), 3521–3530 (2002)
Ma, B., Li, M., Zhang, L.: From gene trees to species trees. SIAM Journal on Computing 30(3), 729–752 (2000)
Maddison, W.P.: Gene trees in species trees. Systematic Biology 46, 523–536 (1997)
Matsen, F.A., Kodner, R.B., Armbrust, E.V.: pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11(1), 538 (2010)
O’Meara, B.C.: New heuristic methods for joint species delimitation and species tree inference. Systematic Biology 59, 59–73 (2010)
Page, R.D.M.: Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Syst. Biol. 43(1), 58–77 (1994)
Page, R.D.M., Charleston, M.A.: From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution 7, 231–240 (1997)
Puigbo, P., Wolf, Y.I., Koonin, E.V.: The tree and net components of prokaryote evolution. Genome Biology and Evolution 2, 745–756 (2010)
Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J., Glöckner, F.O.: The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research 41(D1), D590–D596 (2013)
Sjöstrand, J., Tofigh, A., Daubin, V., Arvestad, L., Sennblad, B., Lagergren, J.: A Bayesian method for analyzing lateral gene transfer. Systematic Biology (2014)
Stark, M., Berger, S.A., Stamatakis, A., von Mering, C.: MLTreeMap - accurate maximum likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genomics 11(1), 461 (2010)
Stolzer, M., Lai, H., Xu, M., Sathaye, D., Vernot, B., Durand, D.: Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinformatics 28(18), i409–i415 (2012)
Thompson, C.C., Thompson, F.L., Vandemeulebroecke, K., Hoste, B., Dawyndt, P., Swings, J.: Use of recA as an alternative phylogenetic marker in the family vibrionaceae. International Journal of Systematic and Evolutionary Microbiology 54(3), 919–924 (2004)
Vernot, B., Stolzer, M., Goldman, A., Durand, D.: Reconciliation with non-binary species trees. Journal of Computational Biology 15(8), 981–1006 (2008)
Zhang, L.: From gene trees to species trees II: Species tree inference by minimizing deep coalescence events. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8, 1685–1691 (2011)
Zhang, L., Cui, Y.: An efficient method for DNA-based species assignment via gene tree and species tree reconciliation. In: Moulton, V., Singh, M. (eds.) WABI 2010. LNCS, vol. 6293, pp. 300–311. Springer, Heidelberg (2010)
Zheng, Y., Zhang, L.: Reconciliation with non-binary gene trees revisited. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 418–432. Springer, Heidelberg (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Betkier, A., Szczęsny, P., Górecki, P. (2015). Fast Algorithms for Inferring Gene-Species Associations. In: Harrison, R., Li, Y., Măndoiu, I. (eds) Bioinformatics Research and Applications. ISBRA 2015. Lecture Notes in Computer Science(), vol 9096. Springer, Cham. https://doi.org/10.1007/978-3-319-19048-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-19048-8_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19047-1
Online ISBN: 978-3-319-19048-8
eBook Packages: Computer ScienceComputer Science (R0)