Abstract
Haplotype inference is an important problem in computational biology, which has deserved large effort and attention in the recent years. Haplotypes encode the genetic data of an individual at a single chromosome. However, humans are diploid (chromosomes have maternal and paternal origin), and it is technologically infeasible to separate the information from homologous chromosomes. Hence, mathematical methods are required to solve the haplotype inference problem. A relevant approach is the pure parsimony. The haplotype inference by pure parsimony (HIPP) aims at finding the minimum number of haplotypes which explains a given set of genotypes. This problem is NP-hard. Boolean satisfiability (SAT) has successful applications in several fields. The use of SAT-based techniques with pure parsimony haplotyping has shown to produce very efficient results. This chapter describes the haplotype inference problem and the SAT-based models developed to solve the problem. Experimental results confirm that the SAT-based methods represent the state of the art in the field of HIPP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
P. Bertolazzi, A. Godi, M. Labbé, and L. Tininini. Solving haplotyping inference parsimony problem using a new basic polynomial formulation. Computers & Mathematics with Applications, 55(5):900–911, 2008
D. Brown and I. Harrower. A new integer programming formulation for the pure parsimony problem in haplotype analysis. In Workshop on Algorithms in Bioinformatics (WABI’04), volume 3240 of LNCS, pages 254–265, 2004
D. Brown and I. Harrower. Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(2):141–154, 2006
S. Browning and B. Browning. Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering. American Journal of Human Genetics, 81(5):1084–1097, 2007
D. Catanzaro, A. Godi, and M. Labbé. A class representative model for pure parsimony haplotyping. INFORMS Journal on Computing, 22(2):195–209, 2009
A. G. Clark. Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology and Evolution, 7(2):111–122, 1990
S. A. Cook. The complexity of theorem-proving procedures. In ACM Symposium on Theory of Computing (STOC’71), pages 151–158, 1971
D. C. Crawford and D. A. Nickerson. Definition and clinical importance of haplotypes. Annual Review of Medicine, 56:303–320, 2005
M. J. Daly, J. D. Rioux, S. F. Schaffner, T. J. Hudson, and E. S. Lander. High-resolution haplotype structure in the human genome. Nature Genetics, 29:229–232, 2001
C. M. Drysdale, D. W. McGraw, C. B. Stack, J. C. Stephens, R. S. Judson, K. Nandabalan, K. Arnold, G. Ruano, and S. B. Liggett. Complex promoter and coding region β2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. In National Academy of Sciences, volume 97, pages 10483–10488, 2000
N. Eén and N. Sörensson. An extensible SAT-solver. In International Conference on Theory and Applications of Satisfiability Testing (SAT’03), pages 502–518, 2003
E. Erdem and F. Ture. Efficient haplotype inference with answer set programming. In National Conference on Artificial Intelligence (AAAI’08), pages 434–441, 2008
L. Gaspero and A. Roli. Stochastic local search for large-scale instances of the haplotype inference problem by pure parsimony. Journal of Algorithms, 63(1–3):55–69, 2008
E. Giunchiglia, Y. Lierler, and M. Maratea. Answer set programming based on propositional satisfiability. Journal of Automated Reasoning, 36(4):345–377, 2006
D. B. Goldstein, S. K. Tate, and S. M. Sisodiya. Pharmacogenetics goes genomic. Nature Reviews Genetics, 4(12):937–947, 2003
A. Graça, J. Marques-Silva, I. Lynce, and A. Oliveira. Efficient haplotype inference with pseudo-Boolean optimization. In Algebraic Biology (AB’07), volume 4545 of LNCS, pages 125–139, 2007
A. Graça, J. Marques-Silva, I. Lynce, and A. Oliveira. Haplotype inference with pseudo-Boolean optimization. Annals of Operations Research, doi:10.1007/s10479-009-0675-4, 2010 (in Press) http://www.springerlink.com/content/f8p2583387721p5t/
D. Gusfield. Haplotyping as perfect phylogeny: conceptual framework and efficient solutions. In International Conference on Research in Computational Molecular Biology (RECOMB’02), pages 166–175, 2002
D. Gusfield. Haplotype inference by pure parsimony. In Annual Symposium on Combinatorial Pattern Matching (CPM’03), pages 144–155, 2003
B.V. Halldórsson, V. Bafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail. A survey of computational methods for determining haplotypes. In DIMACS/RECOMB Satellite Workshop on Computational Methods for SNPs and Haplotype Inference, volume 2983 of LNCS, pages 26–47, 2004
Y-T. Huang, K-M. Chao, and T. Chen. An approximation algorithm for haplotype inference by maximum parsimony. Journal of Computational Biology, 12(10):1261–1274, 2005
R. Hudson. Gene genealogies and the coalescent process. Oxford Survey of Evolutionary Biology, 7:1–44, 1990
R. R. Hudson. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18(2):337–338, 2002
G. Johnson, L. Esposito, B. Barratt, A. Smith, J. Heward, G. Genova, H. Ueda, H. Cordell, I. Eaves, F. Dudbridge, R. Twells, F. Payne, W. Hughes, S. Nutland, H. Stevens, P. Carr, E. Tuomilehto-Wolf, J. Tuomilehto, S. Gough, D. Clayton, and J. Todd. Haplotype tagging for the identification of common disease genes. Nature, 29:233–237, 2001
E. Kelly, F. Sievers, and R. McManus. Haplotype frequency estimation error analysis in the presence of missing genotype data. BMC Bioinformatics, 5:188, 2004
D. L. Kroetz, C. Pauli-Magnus, L. M. Hodges, C. C. Huang, M. Kawamoto, S. J. Johns, D. Stryke, T. E. Ferrin, J. DeYoung, T. Taylor, E. J. Carlson, I. Herskowitz, K. M. Giacomini, and A. G. Clark. Sequence diversity and haplotype structure in the human ABCD1 (MDR1, multidrug resistance transporter). Pharmacogenetics, 13:481–494, 2003
G. Lancia and P. Serafini. A set-covering approach with column generation for parsimony haplotyping. INFORMS Journal on Computing, 21(1):151–166, 2009
G. Lancia, C. M. Pinotti, and R. Rizzi. Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS Journal on Computing, 16(4):348–359, 2004
I. Lynce and J. Marques-Silva. Efficient haplotype inference with Boolean satisfiability. In National Conference on Artificial Intelligence (AAAI’06), pages 104–109, 2006
I. Lynce and J. Marques-Silva. SAT in bioinformatics: Making the case with haplotype inference. In International Conference on Theory and Applications of Satisfiability Testing (SAT’06), volume 4121 of LNCS, pages 136–141, 2006
I. Lynce and J. Marques-Silva. Haplotype inference with Boolean satisfiability. International Journal on Artificial Intelligence Tools, 17(2):355–387, 2008
I. Lynce, A. Graça, J. Marques-Silva, and A. Oliveira. Haplotype inference with Boolean constraint solving: an overview. In IEEE International Conference on Tools with Artificial Intelligence (ICTAI’08), volume I, pages 92–100, 2008
J. Marques-Silva. Practical applications of Boolean satisfiability. In Workshop on Discrete Event Systems (WODES’08), 2008
J. Marques-Silva, I. Lynce, A. Graça, and A. Oliveira. Efficient and tight upper bounds for haplotype inference by pure parsimony using delayed haplotype selection. In 13th Portuguese Conference on Artificial Intelligence (EPIA’07), volume 4874 of LNAI, pages 621–632. Springer, 2007
J. McCluskey and C. A. Peh. The human leucocyte antigens and clinical medicine: an overview. Reviews in Immunogenetics, 1(1):3–20, 1999
J. Neigenfind, G. Gyetvai, R. Basekow, S. Diehl, U. Achenbach, C. Gebhardt, J. Selbig, and B. Kersten. Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT. BMC Genomics, 9:356, 2008
X. Pan. Haplotype inference by pure parsimony with constraint programming. Master’s thesis, Faculty of Science and Technology, Uppsala Universitet, Sweden, 2009
M. J. Rieder, S. T. Taylor, A. G. Clark, and D. A. Nickerson. Sequence variation in the human angiotensin converting enzyme. Nature Genetics, 22:481–494, 2001
S.F. Schaffner, C. Foo, S. Gabriel, D. Reich, M.J. Daly, and D. Altshuler. Calibrating a coalescent simulation of human genome sequence variation. Genome Research, 15:1576–1583, 2005
S. T. Sherry, M. H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin. dbSNP: the NCBI database of genetic variation. Nucleic Acids Research, 29:308–311, 2001
M. Stephens, N. Smith, and P. Donelly. A new statistical method for haplotype reconstruction. American Journal of Human Genetics, 68:978–989, 2001
The International HapMap Consortium. A second generation human haplotype map over 3.1 million snps. Nature, 449:851–861, 2007
L. Tininini, P. Bertolazzi, A. Godi, and G. Lancia. CollHaps: A heuristic approach to haplotype inference by parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 99(1), 2008
L. Wang and Y. Xu. Haplotype inference by maximum parsimony. Bioinformatics, 19(14):1773–1780, 2003
R.-S. Wang, X.-S. Zhang, and L. Sheng. Haplotype inference by pure parsimony via genetic algorithm. In International Symposium on Operations Research and Its Applications (ISORA’05), pages 308–318, 2005
Acknowledgments
This research was funded by Fundação para a Ciência e Tecnologia under research project SHIPs (PTDC/EIA/64164/2006) and PhD grant (SFRH/BD/28599/2006), and by Microsoft under contract 2007-017 of the Microsoft Research PhD Scholarship Programme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer New York
About this chapter
Cite this chapter
Graça, A., Marques-Silva, J., Lynce, I. (2011). Haplotype Inference Using Propositional Satisfiability. In: Bruni, R. (eds) Mathematical Approaches to Polymer Sequence Analysis and Related Problems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6800-5_7
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6800-5_7
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6799-2
Online ISBN: 978-1-4419-6800-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)