Abstract
Haplotype inference from genotype data is a key step towards a better understanding of the role played by genetic variations on inherited diseases. One of the most promising approaches uses the pure parsimony criterion. This approach is called Haplotype Inference by Pure Parsimony (HIPP) and is NP-hard as it aims at minimising the number of haplotypes required to explain a given set of genotypes. The HIPP problem is often solved using constraint satisfaction techniques, for which the upper bound on the number of required haplotypes is a key issue. Another very well-known approach is Clark’s method, which resolves genotypes by greedily selecting an explaining pair of haplotypes. In this work, we combine the basic idea of Clark’s method with a more sophisticated method for the selection of explaining haplotypes, in order to explicitly introduce a bias towards parsimonious explanations. This new algorithm can be used either to obtain an approximated solution to the HIPP problem or to obtain an upper bound on the size of the pure parsimony solution. This upper bound can then used to efficiently encode the problem as a constraint satisfaction problem. The experimental evaluation, conducted using a large set of real and artificially generated examples, shows that the new method is much more effective than Clark’s method at obtaining parsimonious solutions, while keeping the advantages of simplicity and speed of Clark’s method.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adkins, R.M.: Comparison of the accuracy of methods of computational haplotype inference using a large empirical dataset. BMC Genet. 5(1), 22 (2004)
Brown, D., Harrower, I.: A new integer programming formulation for the pure parsimony problem in haplotype analysis. In: Workshop on Algorithms in Bioinformatics (2004)
Brown, D., Harrower, I.: Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(2), 141–154 (2006)
Graça, A., Marques-Silva, J., Lynce, I., Oliveira, A.: Efficient haplotype inference with pseudo-Boolean optimization. Algebraic Biology 2007, 125–139 (July 2007)
Clark, A.G.: Inference of haplotypes from pcr-amplified samples of diploid populations. Molecular Biology and Evolution 7(2), 111–122 (1990)
Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-resolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)
Drysdale, C.M., McGraw, D.W., Stack, C.B., Stephens, J.C., Judson, R.S., Nandabalan, K., Arnold, K., Ruano, G., Liggett, S.B.: Complex promoter and coding region β 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. National Academy of Sciences 97, 10483–10488 (2000)
Greenspan, G., Geiger, D.: High density linkage disequilibrium mapping using models of haplotype block variation. Bioinformatics 20(supp. 1) (2004)
Gusfield, D.: Inference of haplotypes from samples of diploid populations: complexity and algorithms. Journal of Computational Biology 8(3), 305–324 (2001)
Gusfield, D.: Haplotype inference by pure parsimony. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)
Gusfield, D., Orzach, S.H.: Haplotype Inference. In: Handbook on Computational Molecular Biology. Chapman and Hall/CRC Computer and Information Science Series, vol. 9, CRC Press, Boca Raton, USA (2005)
Huang, Y.-T., Chao, K.-M., Chen, T.: An approximation algorithm for haplotype inference by maximum parsimony. Journal of Computational Biology 12(10), 1261–1274 (2005)
Kerem, B., Rommens, J., Buchanan, J., Markiewicz, D., Cox, T., Chakravarti, A., Buchwald, M., Tsui, L.C.: Identification of the cystic fibrosis gene: Genetic analysis. Science 245, 1073–1080 (1989)
Kimura, M., Crow, J.F.: The number of alleles that can be maintained in a finite population. Genetics 49(4), 725–738 (1964)
Kroetz, D.L., Pauli-Magnus, C., Hodges, L.M., Huang, C.C., Kawamoto, M., Johns, S.J., Stryke, D., Ferrin, T.E., DeYoung, J., Taylor, T., Carlson, E.J., Herskowitz, I., Giacomini, K.M., Clark, A.G.: Sequence diversity and haplotype structure in the human abcd1 (mdr1, multidrug resistance transporter). Pharmacogenetics 13, 481–494 (2003)
Lancia, G., Pinotti, C.M., Rizzi, R.: Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS Journal on Computing 16(4), 348–359 (2004)
Lynce, I., Marques-Silva, J.: Efficient haplotype inference with Boolean satisfiability. In: National Conference on Artificial Intelligence (AAAI) (July 2006)
Lynce, I., Marques-Silva, J.: SAT in bioinformatics: Making the case with haplotype inference. In: Biere, A., Gomes, C.P. (eds.) SAT 2006. LNCS, vol. 4121, Springer, Heidelberg (2006)
Niu, T., Qin, Z., Xu, X., Liu, J.: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. American Journal of Human Genetics 70, 157–169 (2002)
Orzack, S.H., Gusfield, D., Olson, J., Nesbitt, S., Subrahmanyan, L., Stanton Jr., V.P.: Analysis and exploration of the use of rule-based algorithms and consensus methods for the inferral of haplotypes. Genetics 165, 915–928 (2003)
Rieder, M.J., Taylor, S.T., Clark, A.G., Nickerson, D.A.: Sequence variation in the human angiotensin converting enzyme. Nature Genetics 22, 481–494 (2001)
Stephens, M., Smith, N., Donelly, P.: A new statistical method for haplotype reconstruction. American Journal of Human Genetics 68, 978–989 (2001)
The International HapMap Consortium: A haplotype map of the human genome. Nature 437, 1299–1320 (October 27, 2005)
Wang, L., Xu, Y.: Haplotype inference by maximum parsimony. Bioinformatics 19(14), 1773–1780 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marques-Silva, J., Lynce, I., Graça, A., Oliveira, A.L. (2007). Efficient and Tight Upper Bounds for Haplotype Inference by Pure Parsimony Using Delayed Haplotype Selection. In: Neves, J., Santos, M.F., Machado, J.M. (eds) Progress in Artificial Intelligence. EPIA 2007. Lecture Notes in Computer Science(), vol 4874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77002-2_52
Download citation
DOI: https://doi.org/10.1007/978-3-540-77002-2_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77000-8
Online ISBN: 978-3-540-77002-2
eBook Packages: Computer ScienceComputer Science (R0)