Skip to main content

Algorithms for Rapid Error Correction for the Gene Duplication Problem

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6674))

Included in the following conference series:

Abstract

Gene tree - species tree reconciliation problems infer the patterns and processes of gene evolution within the context of an organismal phylogeny. In one example, the gene duplication problem seeks the evolutionary scenario that implies the minimum number of gene duplications needed to reconcile a gene tree and a species tree. While the gene duplication problem can effectively link gene and species evolution, error in gene trees can profoundly bias the results. We describe novel algorithms that rapidly search local Subtree Prune and Regraft (SPR) or Tree Bisection and Reconnection (TBR) neighborhoods of a gene tree to find a topology that implies the fewest duplications. These algorithms improve on the current solutions by a factor of n for searching SPR neighborhoods and n 2 for searching TBR neighborhoods, where n is the number of vertices in the given gene tree. They provide a fast error correction protocol for gene trees, in which we allow small gene tree rearrangements to improve the reconciliation cost. We tested the SPR tree rearrangement algorithm on a collection of 1201 plant gene trees, and in every case, the SPR algorithm identified an alternate topology that implied at least one fewer duplication. We also demonstrate a simple method to use the gene rearrangement algorithm to improve gene tree parsimony phylogenetic analyses, which infer a species tree based on the gene duplication problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, B.L., Steel, M.: Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5, 1–13 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  2. Arvestad, L., Berglund, A., Lagergren, J., Sennblad, B.: Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution. In: RECOMB, pp. 326–335 (2004)

    Google Scholar 

  3. Bender, M.A., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  4. Berglund-Sonnhammer, A., Steffansson, P., Betts, M.J., Liberles, D.A.: Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. Journal of Molecular Evolution 63, 240–250 (2006)

    Article  Google Scholar 

  5. Bordewich, M., Semple, C.: On the computational complexity of the rooted subtree prune and regraft distance. Annals of Combinatorics 8, 409–423 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  6. Burleigh, J.G., Bansal, M.S., Eulenstein, O., Hartmann, S., Wehe, A., Vision, T.J.: Genome-scale phylogenetics: inferring the plant tree of life from 18,896 discordant gene trees. Systematic Biology 60(2), 117–125 (2011)

    Article  Google Scholar 

  7. Burleigh, J.G., Bansal, M.S., Wehe, A., Eulenstein, O.: Locating large-scale gene duplication events through reconciled trees: Implications for identifying ancient polyploidy events in plants. Journal of Computational Biology 16, 1071–1083 (2009)

    Article  MathSciNet  Google Scholar 

  8. Chang, W., Burleigh, J.G., Fernández-Baca, D., Eulenstein, O.: An ILP solution for the gene duplication problem. BMC Bioinformatics 12(Suppl 1), S14 (2011)

    Article  Google Scholar 

  9. Chang, W., Eulenstein, O.: Reconciling gene trees with apparent polytomies. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 235–244. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Chen, K., Durand, D., Farach-Colton, M.: Notung: a program for dating gene duplications and optimizing gene family trees. Journal of Computational Biology 7, 429–447 (2000)

    Article  Google Scholar 

  11. Cotton, J.A., Page, R.D.M.: Going nuclear: gene family evolution and vertebrate phylogeny reconciled. P. Roy. Soc. Lond. B Biol. 269, 1555–1561 (2002)

    Article  Google Scholar 

  12. Durand, D., Halldórsson, B.V., Vernot, B.: A hybrid micro-macroevolutionary approach to gene tree reconstruction. Journal of Computational Biology 13(2), 320–335 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  13. Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792–1797 (2004)

    Article  Google Scholar 

  14. Eulenstein, O.: Predictions of gene-duplications and their phylogenetic development, Ph.D. thesis, University of Bonn, Germany, 1998, GMD Research Series No. 20 / 1998 (1998) ISSN: 1435-2699

    Google Scholar 

  15. Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage. a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28, 132–163 (1979)

    Article  Google Scholar 

  16. Górecki, P., Tiuryn, J.: Inferring phylogeny from whole genomes. In: ECCB (Supplement of Bioinformatics), pp. 116–122 (2006)

    Google Scholar 

  17. Guigó, R., Muchnik, I., Smith, T.F.: Reconstruction of ancient molecular phylogeny. Molecular Phylogenetics and Evolution 6(2), 189–213 (1996)

    Article  Google Scholar 

  18. Hahn, M.W.: Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biology 8, R141 (2007)

    Article  Google Scholar 

  19. Huang, H., Knowles, L.L.: What is the danger of the anomaly zone for empirical phylogenetics? Systematic Biology 58, 527–536 (2009)

    Article  Google Scholar 

  20. Joly, S., Bruneau, A.: Measuring branch support in species trees obtained by gene tree parsimony. Systematic Biology 58, 100–113 (2009)

    Article  Google Scholar 

  21. Maddison, W.P.: Gene trees in species trees. Systematic Biology 46, 523–536 (1997)

    Article  Google Scholar 

  22. Page, R.D.M.: Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Systematic Biology 43(1), 58–77 (1994)

    Google Scholar 

  23. Page, R.D.M., Charleston, M.A.: From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Molec. Phyl. and Evol. 7, 231–240 (1997)

    Article  Google Scholar 

  24. Rasmussen, M.D., Kellis, M.: A bayesian approach for fast and accurate gene tree reconstruction. Molecular Biology and Evolution 28, 273–290 (2011)

    Article  Google Scholar 

  25. Rouard, M., Guignon, V., Aluome, C., Laporte, M., Droc, G., Walde, C., Zmasek, C.M., Périn, C., Conte, M.G.: Greenphyldb v2.0: comparative and functional genomics in plants. Nucleic Acids Research 39, D1095–D1102 (2010)

    Article  Google Scholar 

  26. Sanderson, M.J., McMahon, M.M.: Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evolutionary Biology 7(suppl 1), S3 (2007)

    Article  Google Scholar 

  27. Slowinski, J.B., Knight, A., Rooney, A.P.: Inferring species trees from gene trees: A phylogenetic analysis of the elapidae (serpentes) based on the amino acid sequences of venom proteins. Molecular Phylogenetics and Evolution 8, 349–362 (1997)

    Article  Google Scholar 

  28. Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)

    Article  Google Scholar 

  29. Vernot, B., Stolzer, M., Goldman, A., Durand, D.: Reconciliation with non-binary species trees. Computational Systems Bioinformatics 53, 441–452 (2007)

    Article  Google Scholar 

  30. Wehe, A., Bansal, M.S., Burleigh, J.G., Eulenstein, O.: Duptree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13) (2008)

    Google Scholar 

  31. Zhang, L.: On a Mirkin-Muchnik-Smith conjecture for comparing molecular phylogenies. Journal of Computational Biology 4(2), 177–187 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chaudhary, R., Burleigh, J.G., Eulenstein, O. (2011). Algorithms for Rapid Error Correction for the Gene Duplication Problem. In: Chen, J., Wang, J., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2011. Lecture Notes in Computer Science(), vol 6674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21260-4_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21260-4_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21259-8

  • Online ISBN: 978-3-642-21260-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics