Abstract
Gene tree - species tree reconciliation problems infer the patterns and processes of gene evolution within the context of an organismal phylogeny. In one example, the gene duplication problem seeks the evolutionary scenario that implies the minimum number of gene duplications needed to reconcile a gene tree and a species tree. While the gene duplication problem can effectively link gene and species evolution, error in gene trees can profoundly bias the results. We describe novel algorithms that rapidly search local Subtree Prune and Regraft (SPR) or Tree Bisection and Reconnection (TBR) neighborhoods of a gene tree to find a topology that implies the fewest duplications. These algorithms improve on the current solutions by a factor of n for searching SPR neighborhoods and n 2 for searching TBR neighborhoods, where n is the number of vertices in the given gene tree. They provide a fast error correction protocol for gene trees, in which we allow small gene tree rearrangements to improve the reconciliation cost. We tested the SPR tree rearrangement algorithm on a collection of 1201 plant gene trees, and in every case, the SPR algorithm identified an alternate topology that implied at least one fewer duplication. We also demonstrate a simple method to use the gene rearrangement algorithm to improve gene tree parsimony phylogenetic analyses, which infer a species tree based on the gene duplication problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allen, B.L., Steel, M.: Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5, 1–13 (2001)
Arvestad, L., Berglund, A., Lagergren, J., Sennblad, B.: Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution. In: RECOMB, pp. 326–335 (2004)
Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)
Berglund-Sonnhammer, A., Steffansson, P., Betts, M.J., Liberles, D.A.: Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. Journal of Molecular Evolution 63, 240–250 (2006)
Bordewich, M., Semple, C.: On the computational complexity of the rooted subtree prune and regraft distance. Annals of Combinatorics 8, 409–423 (2004)
Burleigh, J.G., Bansal, M.S., Eulenstein, O., Hartmann, S., Wehe, A., Vision, T.J.: Genome-scale phylogenetics: inferring the plant tree of life from 18,896 discordant gene trees. Systematic Biology 60(2), 117–125 (2011)
Burleigh, J.G., Bansal, M.S., Wehe, A., Eulenstein, O.: Locating large-scale gene duplication events through reconciled trees: Implications for identifying ancient polyploidy events in plants. Journal of Computational Biology 16, 1071–1083 (2009)
Chang, W., Burleigh, J.G., Fernández-Baca, D., Eulenstein, O.: An ILP solution for the gene duplication problem. BMC Bioinformatics 12(Suppl 1), S14 (2011)
Chang, W., Eulenstein, O.: Reconciling gene trees with apparent polytomies. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 235–244. Springer, Heidelberg (2006)
Chen, K., Durand, D., Farach-Colton, M.: Notung: a program for dating gene duplications and optimizing gene family trees. Journal of Computational Biology 7, 429–447 (2000)
Cotton, J.A., Page, R.D.M.: Going nuclear: gene family evolution and vertebrate phylogeny reconciled. P. Roy. Soc. Lond. B Biol. 269, 1555–1561 (2002)
Durand, D., Halldórsson, B.V., Vernot, B.: A hybrid micro-macroevolutionary approach to gene tree reconstruction. Journal of Computational Biology 13(2), 320–335 (2006)
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792–1797 (2004)
Eulenstein, O.: Predictions of gene-duplications and their phylogenetic development, Ph.D. thesis, University of Bonn, Germany, 1998, GMD Research Series No. 20 / 1998 (1998) ISSN: 1435-2699
Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage. a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28, 132–163 (1979)
Górecki, P., Tiuryn, J.: Inferring phylogeny from whole genomes. In: ECCB (Supplement of Bioinformatics), pp. 116–122 (2006)
Guigó, R., Muchnik, I., Smith, T.F.: Reconstruction of ancient molecular phylogeny. Molecular Phylogenetics and Evolution 6(2), 189–213 (1996)
Hahn, M.W.: Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biology 8, R141 (2007)
Huang, H., Knowles, L.L.: What is the danger of the anomaly zone for empirical phylogenetics? Systematic Biology 58, 527–536 (2009)
Joly, S., Bruneau, A.: Measuring branch support in species trees obtained by gene tree parsimony. Systematic Biology 58, 100–113 (2009)
Maddison, W.P.: Gene trees in species trees. Systematic Biology 46, 523–536 (1997)
Page, R.D.M.: Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Systematic Biology 43(1), 58–77 (1994)
Page, R.D.M., Charleston, M.A.: From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Molec. Phyl. and Evol. 7, 231–240 (1997)
Rasmussen, M.D., Kellis, M.: A bayesian approach for fast and accurate gene tree reconstruction. Molecular Biology and Evolution 28, 273–290 (2011)
Rouard, M., Guignon, V., Aluome, C., Laporte, M., Droc, G., Walde, C., Zmasek, C.M., Périn, C., Conte, M.G.: Greenphyldb v2.0: comparative and functional genomics in plants. Nucleic Acids Research 39, D1095–D1102 (2010)
Sanderson, M.J., McMahon, M.M.: Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evolutionary Biology 7(suppl 1), S3 (2007)
Slowinski, J.B., Knight, A., Rooney, A.P.: Inferring species trees from gene trees: A phylogenetic analysis of the elapidae (serpentes) based on the amino acid sequences of venom proteins. Molecular Phylogenetics and Evolution 8, 349–362 (1997)
Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)
Vernot, B., Stolzer, M., Goldman, A., Durand, D.: Reconciliation with non-binary species trees. Computational Systems Bioinformatics 53, 441–452 (2007)
Wehe, A., Bansal, M.S., Burleigh, J.G., Eulenstein, O.: Duptree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13) (2008)
Zhang, L.: On a Mirkin-Muchnik-Smith conjecture for comparing molecular phylogenies. Journal of Computational Biology 4(2), 177–187 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chaudhary, R., Burleigh, J.G., Eulenstein, O. (2011). Algorithms for Rapid Error Correction for the Gene Duplication Problem. In: Chen, J., Wang, J., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2011. Lecture Notes in Computer Science(), vol 6674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21260-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-21260-4_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21259-8
Online ISBN: 978-3-642-21260-4
eBook Packages: Computer ScienceComputer Science (R0)