Abstract
Several central and well-known combinatorial problems in phylogenetics and population genetics have efficient, elegant solutions when the input is complete or consists of haplotype data, but lack efficient solutions when input is either incomplete, consists of genotype data, or is for problems generalized from decision questions to optimization questions. Unfortunately, in biological applications, these harder problems arise very often. Previous research has shown that integer-linear programming can sometimes be used to solve hard problems in practice on a range of data that is realistic for current biological applications. Here, we describe a set of related integer linear programming (ILP) formulations for several additional problems, most of which are known to be NP-hard. These ILP formulations address either the issue of missing data, or solve Haplotype Inference Problems with objective functions that model more complex biological phenomena than previous formulations. These ILP formulations solve efficiently on data whose composition reflects a range of data of current biological interest. We also assess the biological quality of the ILP solutions: some of the problems, although not all, solve with excellent quality. These results give a practical way to solve instances of some central, hard biological problems, and give practical ways to assess how well certain natural objective functions reflect complex biological phenomena. Perl code to generate the ILPs (for input to CPLEX) is on the web at wwwcsif.cs.ucdavis.edu/ gusfield.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bafna, V., Bansal, V.: Improved recombination lower bounds for haplotype data. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, Springer, Heidelberg (2005)
Bafna, V., Gusfield, D., Hannenhalli, S., Yooseph, S.: A note on efficient computation of haplotypes via perfect phylogeny. Journal of Computational Biology 11(5), 858–866 (2004)
Brown, D., Harrower, I.: A new formulation for haplotype inference by pure parsimony. report cs-2005-03. Technical report, University of Waterloo, School of Computer Science (2005)
Brown, D.G., Harrower, I.M.: Integer Programming Approaches to Haplotype Inference by Pure Parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(2), 141–154 (2006)
International HapMap Consortium.: A haplotype map of the human genome. Nature 437 1299–1320 (2005)
Ding, Z., Filkov, V., Gusfield, D.: A linear-time algorithm for the perfect phylogeny haplotyping problem. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 585–600. Springer, Heidelberg (2005)
Felsenstein, J.: Inferring Phylogenies. Sinauer, Sunderland, MA (2004)
Gusfield, D.: Efficient algorithms for inferring evolutionary history. Networks 21, 19–28 (1991)
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Gusfield, D.: Haplotyping as Perfect Phylogeny: Conceptual Framework and Efficient Solutions (Extended Abstract). In: Proceedings of RECOMB 2002: The Sixth Annual International Conference on Computational Biology, pp. 166–175 (2002)
Gusfield, D.: Haplotype inference by pure parsimony. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)
Gusfield, D., Orzack, S.: Haplotype inference. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology, vol. 18, pp. 1–25. Chapman and Hall/CRC, Boca Raton (2005)
Halperin, E., Eskin, E.: Haplotype reconstruction from genotype data using Imperfect Phylogeny. Bioinformatics 20, 1842–1849 (2004)
Hein, J., Schierup, M., Wiuf, C.: Gene Genealogies, Variation and Evolution: A primer in coalescent theory. Oxford University Press, Oxford (2005)
Hudson, R.: Generating samples under the Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2), 337–338 (2002)
Hudson, R., Kaplan, N.: Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164 (1985)
Kimmel, G., Shamir, R.: GERBIL: Genotype resolution and block identification using likelihood. PNAS 102, 158–162 (2005)
Lancia, G., Pinotti, C., Rizzi, R.: Haplotyping populations by pure parsimony: Complexity, exact and approximation algorithms. INFORMS J. on Computing, special issue on Computational Biology 16, 348–359 (2004)
Lin, S., Cutler, D., Zwick, M., Chakravarti, A.: Haplotype inference in random population samples. Am. J. of Hum. Genet. 71, 1129–1137 (2002)
Marchini, J., Donnelly, P., et al.: A comparison of phasing algorithms for trios and unrelated individuals. Am. J. of Human Genetics 78, 437–450 (2006)
Pe’er, I., Pupko, T., Shamir, R., Sharan, R.: Incomplete directed perfect phylogeny. SIAM J. on Computing 33, 590–607 (2004)
Satya, R.V., Mukherjee, A.: An optimal algorithm for perfect phylogeny haplotyping. In: Proceedings of 4th CSB Bioinformatics Conference, IEEE Computer Society Press, Los Alamitos (2005)
Satya, R.V., Mukherjee, A., Alexe, G., Parida, L., Bhanot, G.: Constructing near-perfect phylogenies with multiple homoplasy events. Bioinformatics 22, 514–522 (2006) Bioinformatics Suppl., Proceedings of ISMB 2006
Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Human Genetics 78, 629–644 (2006)
Semple, C., Steel, M.: Phylogenetics. Oxford University Press, Oxford (2003)
Song, Y.S., Wu, Y., Gusfield, D.: Haplotyping with one homoplasy or recombination event. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, Springer, Heidelberg (2005)
Steel, M.: The complexity of reconstructing trees from qualitative characters and subtrees. J. of Classification 9, 91–116 (1992)
Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. Am. J. Human Genetics 68, 978–989 (2001)
Wiuf, C.: Inference of recombination and block structure using unphased data. Genetics 166, 537–545 (2004)
Wu, Y.: Personal Communication
Wu, Y., Gusfield, D.: Efficient computation of minimum recombination over genotypes (not haplotypes). In: Proceedings of Life Science Society Computational Systems Bioinformatics (CSB) 2006, pp. 145–156 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gusfield, D., Frid, Y., Brown, D. (2007). Integer Programming Formulations and Computations Solving Phylogenetic and Population Genetic Problems with Missing or Genotypic Data. In: Lin, G. (eds) Computing and Combinatorics. COCOON 2007. Lecture Notes in Computer Science, vol 4598. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73545-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-73545-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73544-1
Online ISBN: 978-3-540-73545-8
eBook Packages: Computer ScienceComputer Science (R0)