Skip to main content

Efficiently Finding the Most Parsimonious Phylogenetic Tree Via Linear Programming

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4463))

Included in the following conference series:

Abstract

Reconstruction of phylogenetic trees is a fundamental problem in computational biology. While excellent heuristic methods are available for many variants of this problem, new advances in phylogeny inference will be required if we are to be able to continue to make effective use of the rapidly growing stores of variation data now being gathered. In this paper, we introduce an integer linear programming formulation to find the most parsimonious phylogenetic tree from a set of binary variation data. The method uses a flow-based formulation that could use exponential numbers of variables and constraints in the worst case. The method has, however, proved extremely efficient in practice on datasets that are well beyond the reach of the available provably efficient methods. The program solves several large mtDNA and Y-chromosome instances within a few seconds, giving provably optimal results in times competitive with fast heuristics than cannot guarantee optimality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwala, R., Fernandez-Baca, D.: A polynomial-time algorithm for the perfect phylogeny problem when the number of character states is fixed. SIAM Journal on Computing 23, 1216–1224 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bafna, V., et al.: Haplotyping as perfect phylogeny: A direct approach. Journal of Computational Biology 10, 323–340 (2003)

    Article  Google Scholar 

  3. Bandelt, H.J., et al.: Mitochondrial portraits of human populations using median networks. Genetics 141, 743–753 (1989)

    Google Scholar 

  4. Barthélemy, J.: From copair hypergraphs to median graphs with latent vertices. Discrete Math. 76, 9–28 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  5. Beasley, J.E.: An algorithm for the Steiner problem in graphs. Networks 14, 147–159 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ravi, R., et al.: Fixed Parameter Tractability of Binary Near-Perfect Phylogenetic Tree Reconstruction. In: Bugliesi, M., et al. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 667–678. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Buneman, P.: The recovery of trees from measures of dissimilarity. In: Hodson, F., et al. (eds.) Mathematics in the Archeological and Historical Sciences, pp. 387–395 (1971)

    Google Scholar 

  8. Cheng, X., Du, D.Z.: Steiner Trees in Industry. Springer, Heidelberg (2002)

    Google Scholar 

  9. The ENCODE Project Consortium: The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306(5696), 636–640 (2004), doi:10.1126/science.1105136

    Article  Google Scholar 

  10. Lindblad-Toh, K., et al.: Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438(7069), 803–819 (2005)

    Article  Google Scholar 

  11. Lindblad-Toh, K., et al.: Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse. Nature Genetics 24, 381–386 (2000)

    Article  Google Scholar 

  12. Felsenstein, J.: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author, Department of Genome Sciences, University of Washington, Seattle (2005)

    Google Scholar 

  13. Fernandez-Baca, D., Lagergren, J.: A polynomial-time algorithm for near-perfect phylogeny. SIAM Journal on Computing 32, 1115–1127 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Foulds, L.R., Graham, R.L.: The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics 3 (1982)

    Google Scholar 

  15. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, New York (1979)

    MATH  Google Scholar 

  16. Gusfield, D.: Efficient algorithms for inferring evolutionary trees. Networks 21, 19–28 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  17. Gusfield, D.: Haplotyping by pure parsimony. Combinatorial Pattern Matching (2003)

    Google Scholar 

  18. Gusfield, D., Bansal, V.: A fundamental decomposition theory for phylogenetic networks and incompatible characters. Research in Computational Molecular Biology (2005)

    Google Scholar 

  19. Helgason, A., et al.: mtDNA variation in Inuit populations of Greenland and Canada: migration history and population structure. American Journal of Physical Anthropology 130, 123–134 (2006)

    Article  Google Scholar 

  20. Hwang, F.K., Richards, D.S., Winter, P.: The Steiner Tree Problem. Annals of Discrete Mathematics, vol. 53 (1992)

    Google Scholar 

  21. The International HapMap Consortium: The International HapMap Project. Nature 426, 789–796 (2005), www.hapmap.org

    Google Scholar 

  22. Kannan, S., Warnow, T.: A fast algorithm for the computation and enumeration of perfect phylogenies. SIAM Journal on Computing 26, 1749–1763 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  23. Lewis, C.M.J., et al.: Land, language, and loci: mtDNA in Native Americans and the genetic history of Peru. American Journal of Physical Anthropology 127, 351–360 (2005)

    Article  Google Scholar 

  24. Maculan, N.: The Steiner problem in graphs. Annals of Discrete Mathematics 31, 185–212 (1987)

    MathSciNet  Google Scholar 

  25. Merimaa, M., et al.: Functional co-adaption of phenol hydroxylase and catechol 2,3-dioxygenase genes in bacteria possessing different phenol and p-cresol degradation pathways. In: Eighth Symposium on Bacterial Genetics and Ecology 31, pp. 185–212 (2005)

    Google Scholar 

  26. S, S., et al.: Human mtDNA hypervariable regions, HVR I and II, hint at deep common maternal founder and subsequent maternal gene flow in Indian population groups. American Journal of Human Genetics 50, 497–506 (2005)

    Article  Google Scholar 

  27. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(4), 406–425 (1987)

    Google Scholar 

  28. Semple, C., Steel, M.: Phylogenetics. Oxford University Press, Oxford (2003)

    MATH  Google Scholar 

  29. The Chimpanzee Sequencing and Analysis Consortium: Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437(7055), 69–87 (2005), http://dx.doi.org/10.1038/nature04072 , doi:10.1038/nature04072

    Article  Google Scholar 

  30. Smigielski, E.M., et al.: dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Research 28(1), 352–355 (2000)

    Article  Google Scholar 

  31. Sridhar, S., et al.: Optimal imperfect phylogeny reconstruction and haplotyping. Computational Systems Bioinformatics (2006)

    Google Scholar 

  32. Sridhar, S., et al.: Simple reconstruction of binary near-perfect phylogenetic trees. In: International Workshop on Bioinformatics Research and Applications (2006)

    Google Scholar 

  33. Stone, A.C., et al.: High levels of Y-chromosome nucleotide diversity in the genus Pan. Proceedings of the National Academy of Sciences USA, 43–48 (2002)

    Google Scholar 

  34. Wirth, T., et al.: Distinguishing human ethnic groups by means of sequences from Helicobacter pylori: Lessons from Ladakh. Proceedings of the National Academy of Sciences USA 101(14), 4746–4751 (2004), doi:10.1073/pnas.0306629101

    Article  Google Scholar 

  35. Wong, R.T.: A dual ascent approach for Steiner tree problems on a directed graph. Mathematical Programming 28, 271–287 (1984)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ion Măndoiu Alexander Zelikovsky

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sridhar, S., Lam, F., Blelloch, G.E., Ravi, R., Schwartz, R. (2007). Efficiently Finding the Most Parsimonious Phylogenetic Tree Via Linear Programming. In: Măndoiu, I., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2007. Lecture Notes in Computer Science(), vol 4463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72031-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72031-7_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72030-0

  • Online ISBN: 978-3-540-72031-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics