Abstract
Reconstruction of phylogenetic trees is a fundamental problem in computational biology. While excellent heuristic methods are available for many variants of this problem, new advances in phylogeny inference will be required if we are to be able to continue to make effective use of the rapidly growing stores of variation data now being gathered. In this paper, we introduce an integer linear programming formulation to find the most parsimonious phylogenetic tree from a set of binary variation data. The method uses a flow-based formulation that could use exponential numbers of variables and constraints in the worst case. The method has, however, proved extremely efficient in practice on datasets that are well beyond the reach of the available provably efficient methods. The program solves several large mtDNA and Y-chromosome instances within a few seconds, giving provably optimal results in times competitive with fast heuristics than cannot guarantee optimality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwala, R., Fernandez-Baca, D.: A polynomial-time algorithm for the perfect phylogeny problem when the number of character states is fixed. SIAM Journal on Computing 23, 1216–1224 (1994)
Bafna, V., et al.: Haplotyping as perfect phylogeny: A direct approach. Journal of Computational Biology 10, 323–340 (2003)
Bandelt, H.J., et al.: Mitochondrial portraits of human populations using median networks. Genetics 141, 743–753 (1989)
Barthélemy, J.: From copair hypergraphs to median graphs with latent vertices. Discrete Math. 76, 9–28 (1989)
Beasley, J.E.: An algorithm for the Steiner problem in graphs. Networks 14, 147–159 (1984)
Ravi, R., et al.: Fixed Parameter Tractability of Binary Near-Perfect Phylogenetic Tree Reconstruction. In: Bugliesi, M., et al. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 667–678. Springer, Heidelberg (2006)
Buneman, P.: The recovery of trees from measures of dissimilarity. In: Hodson, F., et al. (eds.) Mathematics in the Archeological and Historical Sciences, pp. 387–395 (1971)
Cheng, X., Du, D.Z.: Steiner Trees in Industry. Springer, Heidelberg (2002)
The ENCODE Project Consortium: The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306(5696), 636–640 (2004), doi:10.1126/science.1105136
Lindblad-Toh, K., et al.: Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438(7069), 803–819 (2005)
Lindblad-Toh, K., et al.: Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse. Nature Genetics 24, 381–386 (2000)
Felsenstein, J.: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author, Department of Genome Sciences, University of Washington, Seattle (2005)
Fernandez-Baca, D., Lagergren, J.: A polynomial-time algorithm for near-perfect phylogeny. SIAM Journal on Computing 32, 1115–1127 (2003)
Foulds, L.R., Graham, R.L.: The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics 3 (1982)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, New York (1979)
Gusfield, D.: Efficient algorithms for inferring evolutionary trees. Networks 21, 19–28 (1991)
Gusfield, D.: Haplotyping by pure parsimony. Combinatorial Pattern Matching (2003)
Gusfield, D., Bansal, V.: A fundamental decomposition theory for phylogenetic networks and incompatible characters. Research in Computational Molecular Biology (2005)
Helgason, A., et al.: mtDNA variation in Inuit populations of Greenland and Canada: migration history and population structure. American Journal of Physical Anthropology 130, 123–134 (2006)
Hwang, F.K., Richards, D.S., Winter, P.: The Steiner Tree Problem. Annals of Discrete Mathematics, vol. 53 (1992)
The International HapMap Consortium: The International HapMap Project. Nature 426, 789–796 (2005), www.hapmap.org
Kannan, S., Warnow, T.: A fast algorithm for the computation and enumeration of perfect phylogenies. SIAM Journal on Computing 26, 1749–1763 (1997)
Lewis, C.M.J., et al.: Land, language, and loci: mtDNA in Native Americans and the genetic history of Peru. American Journal of Physical Anthropology 127, 351–360 (2005)
Maculan, N.: The Steiner problem in graphs. Annals of Discrete Mathematics 31, 185–212 (1987)
Merimaa, M., et al.: Functional co-adaption of phenol hydroxylase and catechol 2,3-dioxygenase genes in bacteria possessing different phenol and p-cresol degradation pathways. In: Eighth Symposium on Bacterial Genetics and Ecology 31, pp. 185–212 (2005)
S, S., et al.: Human mtDNA hypervariable regions, HVR I and II, hint at deep common maternal founder and subsequent maternal gene flow in Indian population groups. American Journal of Human Genetics 50, 497–506 (2005)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(4), 406–425 (1987)
Semple, C., Steel, M.: Phylogenetics. Oxford University Press, Oxford (2003)
The Chimpanzee Sequencing and Analysis Consortium: Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437(7055), 69–87 (2005), http://dx.doi.org/10.1038/nature04072 , doi:10.1038/nature04072
Smigielski, E.M., et al.: dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Research 28(1), 352–355 (2000)
Sridhar, S., et al.: Optimal imperfect phylogeny reconstruction and haplotyping. Computational Systems Bioinformatics (2006)
Sridhar, S., et al.: Simple reconstruction of binary near-perfect phylogenetic trees. In: International Workshop on Bioinformatics Research and Applications (2006)
Stone, A.C., et al.: High levels of Y-chromosome nucleotide diversity in the genus Pan. Proceedings of the National Academy of Sciences USA, 43–48 (2002)
Wirth, T., et al.: Distinguishing human ethnic groups by means of sequences from Helicobacter pylori: Lessons from Ladakh. Proceedings of the National Academy of Sciences USA 101(14), 4746–4751 (2004), doi:10.1073/pnas.0306629101
Wong, R.T.: A dual ascent approach for Steiner tree problems on a directed graph. Mathematical Programming 28, 271–287 (1984)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sridhar, S., Lam, F., Blelloch, G.E., Ravi, R., Schwartz, R. (2007). Efficiently Finding the Most Parsimonious Phylogenetic Tree Via Linear Programming. In: Măndoiu, I., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2007. Lecture Notes in Computer Science(), vol 4463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72031-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-72031-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72030-0
Online ISBN: 978-3-540-72031-7
eBook Packages: Computer ScienceComputer Science (R0)