Skip to main content
Log in

Haplotype inference with pseudo-Boolean optimization

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

The fast development of sequencing techniques in the recent past has required an urgent development of efficient and accurate haplotype inference tools. Besides being a crucial issue in genetics, haplotype inference is also a challenging computational problem. Among others, pure parsimony is a viable modeling approach to solve the problem of haplotype inference and also an interesting NP-hard problem in itself. Recently, the introduction of SAT-based methods, including pseudo-Boolean optimization (PBO) methods, has produced very efficient solvers. This paper provides a detailed description of RPoly, a PBO approach for the haplotype inference by pure parsimony (HIPP) problem. Moreover, an extensive evaluation of existent HIPP solvers, on a comprehensive set of instances, confirms that RPoly is currently the most efficient and robust HIPP approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aloul, F., Ramadi, A., Markov, I., & Sakallah, K. (2002). Generic ILP versus specialized 0-1 ILP: an update. In IEEE/ACM international conference on computer-aided design (ICCAD’02) (pp. 450–457).

  • Brown, D., & Harrower, I. (2004). A new integer programming formulation for the pure parsimony problem in haplotype analysis. In LNCS: Vol. 3240. Workshop on algorithms in bioinformatics (WABI’04) (pp. 254–265).

  • Brown, D., & Harrower, I. (2006). Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB’06), 3(2), 141–154.

    Article  Google Scholar 

  • Browning, S., & Browning, B. (2007). Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering. American Journal of Human Genetics (AJHG), 81(5), 1084–1097.

    Article  Google Scholar 

  • Burgtorf, C., Kepper, P., Hoehe, M., Schmitt, C., Reinhardt, R., Lehrach, H., & Sauer, S. (2003). Clone-based systematic haplotyping (CSH): a procedure for physical haplotyping of whole genomes. Genome Research, 13(12), 2717–2724.

    Article  Google Scholar 

  • Daly, M., Rioux, J., Schaffner, S., Hudson, T., & Lander, E. (2001). High-resolution haplotype structure in the human genome. Nature Genetics, 29, 229–232.

    Article  Google Scholar 

  • Delaneau, O., Coulonges, C., & Zagury, J. F. (2008). Shape-IT: new rapid an accurate algorithm for haplotype inference. BMC Bioinformatics, 9, 540.

    Article  Google Scholar 

  • Drysdale, C., McGraw, D., Stack, C., Stephens, J., Judson, R., Nandabalan, K., Arnold, K., Ruano, G., & Liggett, S. (2000). Complex promoter and coding region β 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. In National academy of sciences (NAS) (Vol. 97, pp. 10.483–10.488).

  • Eén, N., & Sörensson, N. (2003). An extensible SAT-solver. In LNCS: vol. 2919, International conference on theory and applications of satisfiability testing (SAT’03) (pp. 502–518).

  • Eén, N., & Sörensson, N. (2006). Translating pseudo-Boolean constraints into SAT. Journal on Satisfiability, Boolean Modeling and Computation, 2, 1–26.

    Google Scholar 

  • Erdem, E., & Türe, F. (2008). Efficient haplotype inference with answer set programming. In National conference on artificial intelligence (AAAI’08) (pp. 436–441).

  • Excoffier, L., & Slatkin, M. (1995). Maximum likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution, 12(5), 921–927.

    Google Scholar 

  • Gaspero, L., & Roli, A. (2008). Stochastic local search for large-scale instances of the haplotype inference problem by pure parsimony. Journal of Algorithms: Algorithms in Logic, Informatics and Cognition, 63(1–3), 55–69.

    Google Scholar 

  • Graça, A., Marques-Silva, J., Lynce, I., & Oliveira, A. (2007). Efficient haplotype inference with pseudo-Boolean optimization. In LNCS: Vol. 4545, Algebraic biology (AB’07) (pp. 125–139).

  • Graça, A., Lynce, I., Marques-Silva, J., & Oliveira, A. (2008a). Generic ILP vs specialized 0-1 ILP for haplotype inference. In Workshop on constraint based methods for bioinformatics (WCB’08).

  • Graça, A., Marques-Silva, J., Lynce, I., & Oliveira, A. (2008b). Efficient haplotype inference with combined CP and OR techniques. In LNCS: Vol. 5015, International conference on integration of AI and OR techniques in constraint programming for combinatorial optimization problems (CPAIOR’08) (pp. 308–312).

  • Gusfield, D. (2003). Haplotype inference by pure parsimony. In Annual symposium on combinatorial pattern matching (CPM’03) (pp. 144–155).

  • Halldórsson, B., Bafna, V., Edwards, N., Lippert, R., Yooseph, S., & Istrail, S. (2004). A survey of computational methods for determining haplotypes. In LNCS: Vol. 2983, DIMACS/RECOMB satellite workshop on computational methods for SNPs and haplotype inference (pp. 26–47).

  • Halperin, E., & Eskin, E. (2004). Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics, 20(12), 1842–1849.

    Article  Google Scholar 

  • Halperin, E., & Karp, R. (2004). Perfect phylogeny and haplotype assignment. In Annual international conference on computational molecular biology (RECOMB’03) (pp. 10–19).

  • Huang, Y., Chao, K., & Chen, T. (2005). An approximation algorithm for haplotype inference by maximum parsimony. Journal of Computational Biology, 12(10), 1261–1274.

    Article  Google Scholar 

  • Hudson, R. (1990). Gene genealogies and the coalescent process. Oxford Survey of Evolutionary Biology, 7, 1–44.

    Google Scholar 

  • Hudson, R. (2002). Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18(2), 337–338.

    Article  Google Scholar 

  • Johnson, G., Esposito, L., Barratt, B., Smith, A., Heward, J., Genova, G., Ueda, H., Cordell, H., Eaves, I., Dudbridge, F., Twells, R., Payne, F., Hughes, W., Nutland, S., Stevens, H., Carr, P., Tuomilehto-Wolf, E., Tuomilehto, J., Gough, S., Clayton, D., & Todd, J. (2001). Haplotype tagging for the identification of common disease genes. Nature, 29, 233–237.

    Google Scholar 

  • Kelly, E., Sievers, F., & McManus, R. (2004). Haplotype frequency estimation error analysis in the presence of missing genotype data. BMC Bioinformatics, 5, 188.

    Article  Google Scholar 

  • Kerem, B., Rommens, J., Buchanan, J., Markiewicz, D., Cox, T., Chakravarti, A., Buchwald, M., & Tsui, L. C. (1989). Identification of the cystic fibrosis gene: Genetic analysis. Science, 245, 1073–1080.

    Article  Google Scholar 

  • Kroetz, D. L., Pauli-Magnus, C., Hodges, L. M., Huang, C. C., Kawamoto, M., Johns, S. J., Stryke, D., Ferrin, T. E., DeYoung, J., Taylor, T., Carlson, E. J., Herskowitz, I., Giacomini, K. M., & Clark, A. G. (2003). Sequence diversity and haplotype structure in the human ABCD1 (MDR1, multidrug resistance transporter). Pharmacogenetics, 13, 481–494.

    Article  Google Scholar 

  • Lancia, G., Pinotti, C. M., & Rizzi, R. (2004). Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS Journal on Computing, 16(4), 348–359.

    Article  Google Scholar 

  • Lynce, I., & Marques-Silva, J. (2006a). Efficient haplotype inference with Boolean satisfiability. In National conference on artificial intelligence (AAAI’06) (pp. 104–109).

  • Lynce, I., & Marques-Silva, J. (2006b). SAT in bioinformatics: making the case with haplotype inference. In LNCS: Vol. 4121, International conference on theory and applications of satisfiability testing (SAT’06) (pp. 136–141).

  • Lynce, I., & Marques-Silva, J. (2008). Haplotype inference with Boolean satisfiability. International Journal on Artificial Intelligence Tools, 17(2), 355–387.

    Article  Google Scholar 

  • Lynce, I., Marques-Silva, J., & Prestwich, S. (2008). Boosting haplotype inference with local search. Constraints, 13(1), 155–179.

    Article  Google Scholar 

  • Manquinho, V., & Marques-Silva, J. (2005). Effective lower bounding techniques for pseudo-Boolean optimization. In Design, automation and test in Europe conference and exhibition (DATE’05) (pp. 660–665).

  • Manquinho, V., Marques-Silva, J., & Planes, J. (2009). Algorithms for weighted Boolean optimization. In LNCS: Vol. 5584, International conference on theory and applications of satisfiability testing (SAT’09) (pp. 495–508).

  • Marchini, J., Cutler, D., Patterson, N., Stephens, M., Eskin, E., Halperin, E., Lin, S., Qin, Z., Munro, H., Abecassis, G., Donnelly, P., & Consortium, I. H. (2006). A comparison of phasing algorithms for trios and unrelated individuals. American Journal of Human Genetics, 78, 437–450.

    Article  Google Scholar 

  • Neigenfind, J., Gyetvai, G., Basekow, R., Diehl, S., Achenbach, U., Gebhardt, C., Selbig, J., & Kersten, B. (2008). Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT. BMC Genomics, 9, 356.

    Article  Google Scholar 

  • Patil, N., Berno, A., Hinds, D., Barrett, W., Doshi, J., Hacker, C., Kautzer, C., Lee, D., Marjoribanks, C., McDonough, D., Nguyen, B., Norris, M., Sheehan, J., Shen, N., Stern, D., Stokowski, R., Thomas, D., Trulson, M., Vyas, K., Frazer, K., Fodor, S., & Cox, D. (2001). Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science, 294, 1719–1723.

    Article  Google Scholar 

  • Rieder, M. J., Taylor, S. T., Clark, A. G., & Nickerson, D. A. (2001). Sequence variation in the human angiotensin converting enzyme. Nature Genetics, 22, 481–494.

    Google Scholar 

  • Schaffner, S., Foo, C., Gabriel, S., Reich, D., Daly, M., & Altshuler, D. (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome Research, 15, 1576–1583.

    Article  Google Scholar 

  • Scheet, P., & Stephens, M. (2006). A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics, 78, 629–644.

    Article  Google Scholar 

  • Sheini, H. M., & Sakallah, K. A. (2006). Pueblo: A hybrid pseudo-Boolean SAT solver. Journal on Satisfiability, Boolean Modeling and Computation, 2, 165–189.

    Google Scholar 

  • Stephens, M., Smith, N., & Donelly, P. (2001). A new statistical method for haplotype reconstruction. American Journal of Human Genetics, 68, 978–989.

    Article  Google Scholar 

  • The International HapMap Consortium (2003). The international hapmap project. Nature, 426, 789–796.

    Article  Google Scholar 

  • The International HapMap Consortium (2005). A haplotype map of the human genome. Nature, 437, 1299–1320.

    Article  Google Scholar 

  • The International HapMap Consortium (2007). A second generation human haplotype map over 3.1 million SNPs. Nature, 449, 851–861.

    Article  Google Scholar 

  • Wang, L., & Xu, Y. (2003). Haplotype inference by maximum parsimony. Bioinformatics, 19(14), 1773–1780.

    Article  Google Scholar 

  • Wang, R. S., Zhang, X. S., & Sheng, L. (2005). Haplotype inference by pure parsimony via genetic algorithm. In LNOR: Vol. 5, Operations research and its applications: the fifth international symposium (ISORA’05) (pp. 296–306).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Graça.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Graça, A., Marques-Silva, J., Lynce, I. et al. Haplotype inference with pseudo-Boolean optimization. Ann Oper Res 184, 137–162 (2011). https://doi.org/10.1007/s10479-009-0675-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-009-0675-4

Navigation