Skip to main content

Haplotype Inference Using Propositional Satisfiability

  • Chapter
  • First Online:
Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Abstract

Haplotype inference is an important problem in computational biology, which has deserved large effort and attention in the recent years. Haplotypes encode the genetic data of an individual at a single chromosome. However, humans are diploid (chromosomes have maternal and paternal origin), and it is technologically infeasible to separate the information from homologous chromosomes. Hence, mathematical methods are required to solve the haplotype inference problem. A relevant approach is the pure parsimony. The haplotype inference by pure parsimony (HIPP) aims at finding the minimum number of haplotypes which explains a given set of genotypes. This problem is NP-hard. Boolean satisfiability (SAT) has successful applications in several fields. The use of SAT-based techniques with pure parsimony haplotyping has shown to produce very efficient results. This chapter describes the haplotype inference problem and the SAT-based models developed to solve the problem. Experimental results confirm that the SAT-based methods represent the state of the art in the field of HIPP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.ncbi.nlm.nih.gov/projects/SNP

References

  1. P. Bertolazzi, A. Godi, M. Labbé, and L. Tininini. Solving haplotyping inference parsimony problem using a new basic polynomial formulation. Computers & Mathematics with Applications, 55(5):900–911, 2008

    Article  Google Scholar 

  2. D. Brown and I. Harrower. A new integer programming formulation for the pure parsimony problem in haplotype analysis. In Workshop on Algorithms in Bioinformatics (WABI’04), volume 3240 of LNCS, pages 254–265, 2004

    Google Scholar 

  3. D. Brown and I. Harrower. Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(2):141–154, 2006

    Article  PubMed  CAS  Google Scholar 

  4. S. Browning and B. Browning. Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering. American Journal of Human Genetics, 81(5):1084–1097, 2007

    Article  PubMed  CAS  Google Scholar 

  5. D. Catanzaro, A. Godi, and M. Labbé. A class representative model for pure parsimony haplotyping. INFORMS Journal on Computing, 22(2):195–209, 2009

    Article  Google Scholar 

  6. A. G. Clark. Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology and Evolution, 7(2):111–122, 1990

    PubMed  CAS  Google Scholar 

  7. S. A. Cook. The complexity of theorem-proving procedures. In ACM Symposium on Theory of Computing (STOC’71), pages 151–158, 1971

    Google Scholar 

  8. D. C. Crawford and D. A. Nickerson. Definition and clinical importance of haplotypes. Annual Review of Medicine, 56:303–320, 2005

    Article  PubMed  CAS  Google Scholar 

  9. M. J. Daly, J. D. Rioux, S. F. Schaffner, T. J. Hudson, and E. S. Lander. High-resolution haplotype structure in the human genome. Nature Genetics, 29:229–232, 2001

    Article  PubMed  CAS  Google Scholar 

  10. C. M. Drysdale, D. W. McGraw, C. B. Stack, J. C. Stephens, R. S. Judson, K. Nandabalan, K. Arnold, G. Ruano, and S. B. Liggett. Complex promoter and coding region β2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. In National Academy of Sciences, volume 97, pages 10483–10488, 2000

    Google Scholar 

  11. N. Eén and N. Sörensson. An extensible SAT-solver. In International Conference on Theory and Applications of Satisfiability Testing (SAT’03), pages 502–518, 2003

    Google Scholar 

  12. E. Erdem and F. Ture. Efficient haplotype inference with answer set programming. In National Conference on Artificial Intelligence (AAAI’08), pages 434–441, 2008

    Google Scholar 

  13. L. Gaspero and A. Roli. Stochastic local search for large-scale instances of the haplotype inference problem by pure parsimony. Journal of Algorithms, 63(1–3):55–69, 2008

    Article  Google Scholar 

  14. E. Giunchiglia, Y. Lierler, and M. Maratea. Answer set programming based on propositional satisfiability. Journal of Automated Reasoning, 36(4):345–377, 2006

    Article  Google Scholar 

  15. D. B. Goldstein, S. K. Tate, and S. M. Sisodiya. Pharmacogenetics goes genomic. Nature Reviews Genetics, 4(12):937–947, 2003

    Article  PubMed  CAS  Google Scholar 

  16. A. Graça, J. Marques-Silva, I. Lynce, and A. Oliveira. Efficient haplotype inference with pseudo-Boolean optimization. In Algebraic Biology (AB’07), volume 4545 of LNCS, pages 125–139, 2007

    Google Scholar 

  17. A. Graça, J. Marques-Silva, I. Lynce, and A. Oliveira. Haplotype inference with pseudo-Boolean optimization. Annals of Operations Research, doi:10.1007/s10479-009-0675-4, 2010 (in Press) http://www.springerlink.com/content/f8p2583387721p5t/

  18. D. Gusfield. Haplotyping as perfect phylogeny: conceptual framework and efficient solutions. In International Conference on Research in Computational Molecular Biology (RECOMB’02), pages 166–175, 2002

    Google Scholar 

  19. D. Gusfield. Haplotype inference by pure parsimony. In Annual Symposium on Combinatorial Pattern Matching (CPM’03), pages 144–155, 2003

    Google Scholar 

  20. B.V. Halldórsson, V. Bafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail. A survey of computational methods for determining haplotypes. In DIMACS/RECOMB Satellite Workshop on Computational Methods for SNPs and Haplotype Inference, volume 2983 of LNCS, pages 26–47, 2004

    Google Scholar 

  21. Y-T. Huang, K-M. Chao, and T. Chen. An approximation algorithm for haplotype inference by maximum parsimony. Journal of Computational Biology, 12(10):1261–1274, 2005

    Google Scholar 

  22. R. Hudson. Gene genealogies and the coalescent process. Oxford Survey of Evolutionary Biology, 7:1–44, 1990

    Google Scholar 

  23. R. R. Hudson. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18(2):337–338, 2002

    Article  PubMed  CAS  Google Scholar 

  24. G. Johnson, L. Esposito, B. Barratt, A. Smith, J. Heward, G. Genova, H. Ueda, H. Cordell, I. Eaves, F. Dudbridge, R. Twells, F. Payne, W. Hughes, S. Nutland, H. Stevens, P. Carr, E. Tuomilehto-Wolf, J. Tuomilehto, S. Gough, D. Clayton, and J. Todd. Haplotype tagging for the identification of common disease genes. Nature, 29:233–237, 2001

    CAS  Google Scholar 

  25. E. Kelly, F. Sievers, and R. McManus. Haplotype frequency estimation error analysis in the presence of missing genotype data. BMC Bioinformatics, 5:188, 2004

    Article  PubMed  Google Scholar 

  26. D. L. Kroetz, C. Pauli-Magnus, L. M. Hodges, C. C. Huang, M. Kawamoto, S. J. Johns, D. Stryke, T. E. Ferrin, J. DeYoung, T. Taylor, E. J. Carlson, I. Herskowitz, K. M. Giacomini, and A. G. Clark. Sequence diversity and haplotype structure in the human ABCD1 (MDR1, multidrug resistance transporter). Pharmacogenetics, 13:481–494, 2003

    Article  PubMed  CAS  Google Scholar 

  27. G. Lancia and P. Serafini. A set-covering approach with column generation for parsimony haplotyping. INFORMS Journal on Computing, 21(1):151–166, 2009

    Article  Google Scholar 

  28. G. Lancia, C. M. Pinotti, and R. Rizzi. Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS Journal on Computing, 16(4):348–359, 2004

    Article  Google Scholar 

  29. I. Lynce and J. Marques-Silva. Efficient haplotype inference with Boolean satisfiability. In National Conference on Artificial Intelligence (AAAI’06), pages 104–109, 2006

    Google Scholar 

  30. I. Lynce and J. Marques-Silva. SAT in bioinformatics: Making the case with haplotype inference. In International Conference on Theory and Applications of Satisfiability Testing (SAT’06), volume 4121 of LNCS, pages 136–141, 2006

    Google Scholar 

  31. I. Lynce and J. Marques-Silva. Haplotype inference with Boolean satisfiability. International Journal on Artificial Intelligence Tools, 17(2):355–387, 2008

    Article  Google Scholar 

  32. I. Lynce, A. Graça, J. Marques-Silva, and A. Oliveira. Haplotype inference with Boolean constraint solving: an overview. In IEEE International Conference on Tools with Artificial Intelligence (ICTAI’08), volume I, pages 92–100, 2008

    Google Scholar 

  33. J. Marques-Silva. Practical applications of Boolean satisfiability. In Workshop on Discrete Event Systems (WODES’08), 2008

    Google Scholar 

  34. J. Marques-Silva, I. Lynce, A. Graça, and A. Oliveira. Efficient and tight upper bounds for haplotype inference by pure parsimony using delayed haplotype selection. In 13th Portuguese Conference on Artificial Intelligence (EPIA’07), volume 4874 of LNAI, pages 621–632. Springer, 2007

    Google Scholar 

  35. J. McCluskey and C. A. Peh. The human leucocyte antigens and clinical medicine: an overview. Reviews in Immunogenetics, 1(1):3–20, 1999

    PubMed  CAS  Google Scholar 

  36. J. Neigenfind, G. Gyetvai, R. Basekow, S. Diehl, U. Achenbach, C. Gebhardt, J. Selbig, and B. Kersten. Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT. BMC Genomics, 9:356, 2008

    Article  PubMed  Google Scholar 

  37. X. Pan. Haplotype inference by pure parsimony with constraint programming. Master’s thesis, Faculty of Science and Technology, Uppsala Universitet, Sweden, 2009

    Google Scholar 

  38. M. J. Rieder, S. T. Taylor, A. G. Clark, and D. A. Nickerson. Sequence variation in the human angiotensin converting enzyme. Nature Genetics, 22:481–494, 2001

    Google Scholar 

  39. S.F. Schaffner, C. Foo, S. Gabriel, D. Reich, M.J. Daly, and D. Altshuler. Calibrating a coalescent simulation of human genome sequence variation. Genome Research, 15:1576–1583, 2005

    Article  PubMed  CAS  Google Scholar 

  40. S. T. Sherry, M. H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin. dbSNP: the NCBI database of genetic variation. Nucleic Acids Research, 29:308–311, 2001

    Article  PubMed  CAS  Google Scholar 

  41. M. Stephens, N. Smith, and P. Donelly. A new statistical method for haplotype reconstruction. American Journal of Human Genetics, 68:978–989, 2001

    Article  PubMed  CAS  Google Scholar 

  42. The International HapMap Consortium. A second generation human haplotype map over 3.1 million snps. Nature, 449:851–861, 2007

    Google Scholar 

  43. L. Tininini, P. Bertolazzi, A. Godi, and G. Lancia. CollHaps: A heuristic approach to haplotype inference by parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 99(1), 2008

    Google Scholar 

  44. L. Wang and Y. Xu. Haplotype inference by maximum parsimony. Bioinformatics, 19(14):1773–1780, 2003

    Article  PubMed  CAS  Google Scholar 

  45. R.-S. Wang, X.-S. Zhang, and L. Sheng. Haplotype inference by pure parsimony via genetic algorithm. In International Symposium on Operations Research and Its Applications (ISORA’05), pages 308–318, 2005

    Google Scholar 

Download references

Acknowledgments

This research was funded by Fundação para a Ciência e Tecnologia under research project SHIPs (PTDC/EIA/64164/2006) and PhD grant (SFRH/BD/28599/2006), and by Microsoft under contract 2007-017 of the Microsoft Research PhD Scholarship Programme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Graça .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer New York

About this chapter

Cite this chapter

Graça, A., Marques-Silva, J., Lynce, I. (2011). Haplotype Inference Using Propositional Satisfiability. In: Bruni, R. (eds) Mathematical Approaches to Polymer Sequence Analysis and Related Problems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6800-5_7

Download citation

Publish with us

Policies and ethics