Hap-seq: An Optimal Algorithm for Haplotype Phasing with Imputation Using Sequencing Data

He, Dan; Han, Buhm; Eskin, Eleazar

doi:10.1007/978-3-642-29627-7_8

Dan He²⁰,
Buhm Han²⁰ &
Eleazar Eskin²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7262))

Included in the following conference series:

Annual International Conference on Research in Computational Molecular Biology

1489 Accesses

Abstract

Inference of haplotypes, or the sequence of alleles along each chromosome, is a fundamental problem in genetics and is important for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Traditionally, haplotypes were inferred from genotype data obtained from microarrays utilizing information on population haplotype frequencies inferred from either a large sample of genotyped individuals or a reference dataset such as the HapMap. Since the availability of large reference datasets, modern approaches for haplotype phasing along these lines are closely related to imputation methods. When applied to data obtained from sequencing studies, a straightforward way to obtain haplotypes is to first infer genotypes from the sequence data and then apply an imputation method. However, this approach does not take into account that alleles on the same sequence read originate from the same chromosome. Haplotype assembly approaches take advantage of this insight and predict haplotypes by assigning the reads to chromosomes in such a way that minimizes the number of conflicts between the reads and the predicted haplotypes. Unfortunately, assembly approaches require very high sequencing coverage and are usually not able to fully reconstruct the haplotypes. In this work, we present a novel approach, Hap-seq, which is simultaneously an imputation and assembly method which combines information from a reference dataset with the information from the reads using a likelihood framework. Our method applies a dynamic programming algorithm to identify the predicted haplotype which maximizes the joint likelihood of the haplotype with respect to the reference dataset and the haplotype with respect to the observed reads. We show that our method requires only low sequencing coverage and can reconstruct haplotypes containing both common and rare alleles with higher accuracy compared to the state-of-the-art imputation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads

Article Open access 03 July 2018

GenHap: a novel computational method based on genetic algorithms for haplotype assembly

Article Open access 18 April 2019

Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database

Article Open access 26 May 2021

References

A deep catalog of human genetic variation (2010), http://www.1000genomes.org/
Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153 (2008)
Article Google Scholar
Bansal, V., Halpern, A.L., Axelrod, N., Bafna, V.: An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research 18(8), 1336 (2008)
Article Google Scholar
Beckmann, L.: Haplotype Sharing Methods. In: Encyclopedia of Life Sciences (ELS). John Wiley & Sons, Ltd., Chichester (2010)
Google Scholar
Browning, B.L., Browning, S.R.: A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88(2), 173–182 (2011)
Article MathSciNet Google Scholar
Browning, S.R., Browning, B.L.: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81(5), 1084–1097 (2007)
Article Google Scholar
Browning, S.R., Browning, B.L.: High-resolution detection of identity by descent in unrelated individuals. Am. J. Hum. Genet. 86(4), 526–539 (2010)
Article Google Scholar
Clark, A.G.: Inference of haplotypes from pcr-amplified samples of diploid populations. Mol. Biol. Evol. 7(2), 111–122 (1990)
Google Scholar
Eskin, E., Halperin, E., Karp, R.M.: Efficient reconstruction of haplotype structure via perfect phylogeny. International Journal of Bioinformatics and Computational Biology 1(1), 1–20 (2003)
Article Google Scholar
Gusev, A., Lowe, J.K., Stoffel, M., Daly, M.J., Altshuler, D., Breslow, J.L., Friedman, J.M., Pe’er, I.: Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19(2), 318–326 (2009)
Article Google Scholar
Gusfield, D.: Haplotype Inference by Pure Parsimony. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)
Chapter Google Scholar
Halperin, E., Eskin, E.: Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics 20(12), 1842–1849 (2004)
Article Google Scholar
He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., Eskin, E.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26(12), i183 (2010)
Article Google Scholar
International HapMap Consortium: A second generation human haplotype map of over 3.1 million snps. Nature 449(7164), 851–861 (2007)
Article Google Scholar
Kang, H.M., Zaitlen, N.A., Eskin, E.: Eminim: An adaptive and memory-efficient algorithm for genotype imputation. Journal of Computational Biology 17(3), 547–560 (2010)
Article Google Scholar
Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P., Axelrod, N., Huang, J., Kirkness, E.F., Denisov, G., et al.: The diploid genome sequence of an individual human. PLoS Biol. 5(10), e254 (2007)
Article Google Scholar
Li, Y., Willer, C.J., Ding, J., Scheet, P., Abecasis, G.R.: Mach: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34(8), 816–834 (2010)
Article Google Scholar
Marchini, J., Howie, B., Myers, S., McVean, G., Donnelly, P.: A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics 39(7), 906–913 (2007)
Article Google Scholar
Patil, N., Berno, A.J., Hinds, D.A., Barrett, W.A., Doshi, J.M., Hacker, C.R., Kautzer, C.R., Lee, D.H., Marjoribanks, C., McDonough, D.P., et al.: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294(5547), 1719 (2001)
Article Google Scholar
Patterson, N., Hattangadi, N., Lane, B., Lohmueller, K.E., Hafler, D.A., Oksenberg, J.R., Hauser, S.L., Smith, M.W., O’Brien, S.J., Altshuler, D., et al.: Methods for high-density admixture mapping of disease genes. The American Journal of Human Genetics 74(5), 979–1000 (2004)
Article Google Scholar
Stephens, M., Smith, N.J., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. The American Journal of Human Genetics 68(4), 978–989 (2001)
Article Google Scholar
Wheeler, D.A., Srinivasan, M., Egholm, M., Shen, Y., Chen, L., McGuire, A., He, W., Chen, Y.J., Makhijani, V., Roth, G.T., et al.: The complete genome of an individual by massively parallel DNA sequencing. Nature 452(7189), 872–876 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Dept., Univ. of California, Los Angeles, CA, 90095-1596, USA
Dan He, Buhm Han & Eleazar Eskin

Authors

Dan He
View author publications
You can also search for this author in PubMed Google Scholar
Buhm Han
View author publications
You can also search for this author in PubMed Google Scholar
Eleazar Eskin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Tel-Aviv University, 69978, Tel-Aviv, Israel
Benny Chor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, D., Han, B., Eskin, E. (2012). Hap-seq: An Optimal Algorithm for Haplotype Phasing with Imputation Using Sequencing Data. In: Chor, B. (eds) Research in Computational Molecular Biology. RECOMB 2012. Lecture Notes in Computer Science(), vol 7262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29627-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-29627-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29626-0
Online ISBN: 978-3-642-29627-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics