Abstract
Existing genotyping technologies have enabled researchers to genotype hundreds of thousands of SNPs efficiently and inexpensively. Methods for the imputation of non-genotyped SNPs and the inference of haplotype information from genotypes, however, remain important, since they have the potential to increase the power of statistical association tests. In many cases, studies are conducted in sets of individuals where the pedigree information is relevant, and can be used to increase the power of tests and to decrease the impact of population structure on the obtained results. This paper proposes a new Boolean optimization model for haplotype inference combining two combinatorial approaches: the Minimum Recombinant Haplotyping Configuration (MRHC), which minimizes the number of recombinant events within a pedigree, and the Haplotype Inference by Pure Parsimony (HIPP), that aims at finding a solution with a minimum number of distinct haplotypes within a population. The paper also describes the use of well-known techniques, which yield significant performance gains. Concrete examples include symmetry breaking, identification of lower bounds, and the use of an appropriate constraint solver. Experimental results show that the new PedRPoly model is competitive both in terms of accuracy and efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Achterberg, T., Berthold, T., Koch, T., Wolter, K.: Constraint Integer Programming: A New Approach to Integrate CP and MIP. In: Trick, M.A. (ed.) CPAIOR 2008. LNCS, vol. 5015, pp. 6–20. Springer, Heidelberg (2008)
Andrés, A., Clark, A., Shimmin, L., Boerwinkle, E., Sing, C., Hixson, J.: Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genetic Epidemiology 31(7), 659–671 (2007)
Ansótegui, C., Bonet, M.L., Levy, J.: Solving (Weighted) Partial MaxSAT through Satisfiability Testing. In: Kullmann, O. (ed.) SAT 2009. LNCS, vol. 5584, pp. 427–440. Springer, Heidelberg (2009)
Argelich, J., Lynce, I., Marques-Silva, J.: On solving Boolean multilevel optimization problems. In: International Joint Conference on Artificial Intelligence (IJCAI 2009), pp. 393–398 (2009)
Cheng, I., Penney, K.L., Stram, D.O., Le Marchand, L., Giorgi, E., Haiman, C.A., Kolonel, L.N., Pike, M., Hirschhorn, J., Henderson, B.E., Freedman, M.L.: Haplotype-based association studies of IGFBP1 and IGFBP3 with prostate and breast cancer risk: the multiethnic cohort. Cancer Epidemiol Biomarkers Prev. 15(10), 1993–1997 (2006)
Climer, S., Jäger, G., Templeton, A.R., Zhang, W.: How frugal is mother nature with haplotypes? Bioinformatics 25(1), 68–74 (2009)
Eén, N., Sörensson, N.: Translating pseudo-Boolean constraints into SAT. Journal on Satisfiability, Boolean Modeling and Computation 2, 1–26 (2006)
Fishelson, M., Dovgolevsky, N., Geiger, D.: Maximum likelihood haplotyping for general pedigrees. Human Heredity 59(1), 41–60 (2005)
Graça, A., Lynce, I., Marques-Silva, J., Oliveira, A.: Haplotype inference combining pedigrees and unrelated individuals. In: Workshop on Constraint Based Methods for Bioinformatics (WCB 2009), pp. 27–36 (2009)
Graça, A., Marques-Silva, J., Lynce, I., Oliveira, A.L.: Efficient Haplotype Inference with Pseudo-boolean Optimization. In: Anai, H., Horimoto, K., Kutsia, T. (eds.) AB 2007. LNCS, vol. 4545, pp. 125–139. Springer, Heidelberg (2007)
Graça, A., Marques-Silva, J., Lynce, I., Oliveira, A.L.: Efficient Haplotype Inference with Combined CP and OR Techniques. In: Trick, M.A. (ed.) CPAIOR 2008. LNCS, vol. 5015, pp. 308–312. Springer, Heidelberg (2008)
Gusfield, D.: Haplotype Inference by Pure Parsimony. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)
Haines, J.L.: Chromlook: an interactive program for error detection and mapping in reference linkage data. Genomics 14(2), 517–519 (1992)
Kimura, M.: The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61(4) (1969)
Kirkpatrick, B., Rosa, J., Halperin, E., Karp, R.M.: Haplotype Inference in Complex Pedigrees. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 108–120. Springer, Heidelberg (2009)
Lancia, G., Pinotti, C.M., Rizzi, R.: Haplotyping populations by pure parsimony: complexity of exact and approximation algorithms. INFORMS Journal on Computing 16(4), 348–359 (2004)
Leal, S.M., Yan, K., Müller-Myhsok, B.: SimPed: A simulation program to generate haplotype and genotype data for pedigree structures. Human Heredity 60(2), 119–122 (2005)
Li, C.M., Manyà, F., Mohamedou, N., Planes, J.: Exploiting Cycle Structures in Max-SAT. In: Kullmann, O. (ed.) SAT 2009. LNCS, vol. 5584, pp. 467–480. Springer, Heidelberg (2009)
Li, J., Jiang, T.: Efficient inference of haplotypes from genotypes on a pedigree. Journal of Bioinformatics and Computational Biology 1(1), 41–69 (2003)
Li, J., Jiang, T.: Computing the minimum recombinant haplotype configuration from incomplete genotype data on a pedigree by integer linear programming. Journal of Computational Biology 12(6), 719–739 (2005)
Li, X., Li, J.: Comparison of haplotyping methods using families and unrelated individuals on simulated rheumatoid arthritis data. In: BMC Proceedings, pp. S1–S55 (2007)
Li, X., Li, J.: Efficient haplotype inference from pedigree with missing data using linear systems with disjoint-set data structures. In: International Conference on Computational Systems Bioinformatics (CSB 2008), pp. 297–307 (2008)
Lin, H., Su, K., Li, C.M.: Within-problem learning for efficient lower bound computation in Max-SAT solving. In: National Conference on Artificial Intelligence (AAAI 2008), pp. 351–356 (2008)
Lin, S., Chakravarti, A., Cutler, D.J.: Haplotype and missing data inference in nuclear families. Genome Research 14(8), 1624–1632 (2004)
Liu, L., Xi, C., Xiao, J., Jiang, T.: Complexity and approximation of the minimum recombinant haplotype configuration problem. Theoretical Computer Science 378(3), 316–330 (2007)
Lynce, I., Marques-Silva, J., Prestwich, S.: Boosting haplotype inference with local search. Constraints 13(1), 155–179 (2008)
Manquinho, V., Marques-Silva, J.: Effective lower bounding techniques for pseudo-Boolean optimization. In: Design, Automation and Test in Europe Conference and Exhibition (DATE 2005), pp. 660–665 (2005)
Marchini, J., Cutler, D., Patterson, N., Stephens, M., Eskin, E., Halperin, E., Lin, S., Qin, Z.S., Munro, H.M., Abecassis, G.R., Donnelly, P., International HapMap Consortium: A comparison of phasing algorithms for trios and unrelated individuals. American Journal of Human Genetics 78(3), 437–450 (2006)
Orzack, S.H., Gusfield, D., Olson, J., Nesbitt, S., Subrahmanyan, L., Stanton, V.P.: Analysis and exploration of the use of rule-based algorithms and consensus methods for the inferral of haplotypes. Genetics 165(2), 915–928 (2003)
Pei, Y., Zhang, L., Li, J., Papasian, C.J., Deng, H.-W.: Analyses and comparison of accuracy of different genotype imputation methods. PLoS ONE 3(10) (2008)
Qian, D., Beckmann, L.: Minimum-recombinant haplotyping in pedigrees. American Journal of Human Genetics 70(6), 1434–1445 (2002)
Sánchez, M., Givry, S., Schiex, T.: Mendelian error detection in complex pedigrees using weighted constraint satisfaction techniques. Constraints 13(1-2), 130–154 (2008)
The International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007)
Wang, L., Xu, Y.: Haplotype inference by maximum parsimony. Bioinformatics 19(14), 1773–1780 (2003)
Wijsman, E.M.: A deductive method of haplotype analysis in pedigrees. American Journal of Human Genetics 41(3), 356–373 (1987)
Zhang, K., Qin, Z., Chen, T., Liu, J.S., Waterman, M.S., Sun, F.: HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms. Bioinformatics 21(1), 131–134 (2005)
Zhang, K., Sun, F., Zhao, H.: HAPLORE: a program for haplotype reconstruction in general pedigrees without recombination. Bioinformatics 21(1), 90–103 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Graça, A., Lynce, I., Marques-Silva, J., Oliveira, A.L. (2012). Efficient and Accurate Haplotype Inference by Combining Parsimony and Pedigree Information. In: Horimoto, K., Nakatsui, M., Popov, N. (eds) Algebraic and Numeric Biology. Lecture Notes in Computer Science, vol 6479. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28067-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-28067-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28066-5
Online ISBN: 978-3-642-28067-2
eBook Packages: Computer ScienceComputer Science (R0)