Abstract
Human genomes come in pairs: every individual inherits one version of the genome from the mother and another version from the father. Hence, every chromosome exists in two similar yet distinct “copies”, called haplotypes. The problem of determining the full sequences of both haplotypes is known as phasing or haplotyping. In this paper, we review different approaches for haplotyping and point out how they are formalized as optimization problems. We survey different technologies and, in this way, provide guidance on the characteristics of problem instances resulting from present day technologies. Furthermore, we highlight open algorithmic challenges.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lawson, D.J., Hellenthal, G., Myers, S., Falush, D.: Inference of population structure using dense haplotype data. PLoS Genet. 8(1), e1002453 (2012)
Sabeti, P.C., Varilly, P., Fry, B., et al.: Genome-wide detection and characterization of positive selection in human populations. Nature 449(7164), 913–918 (2007)
Tewhey, R., Bansal, V., Torkamani, A., Topol, E.J., Schork, N.J.: The importance of phase information for human genomics. Nat. Rev. Genet. 12(3), 215–223 (2011)
Corradin, O., Cohen, A.J., Luppino, J.M., Bayles, I.M., Schumacher, F.R., Scacheri, P.C.: Modeling disease risk through analysis of physical interactions between genetic variants within chromatin regulatory circuitry. Nat. Genet. 48(11), 1313–1320 (2016)
Shlyueva, D., Stampfel, G., Stark, A.: Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 15(4), 272–286 (2014)
Hnisz, D., Abraham, B.J., Lee, T.I., Lau, A., Saint-Andr, V., Sigova, A.A., Hoke, H.A., Young, R.A.: Super-enhancers in the control of cell identity and disease. Cell 155(4), 934–947 (2013)
Whyte, W.A., Orlando, D.A., Hnisz, D., Abraham, B.J., Lin, C.Y., Kagey, M.H., Rahl, P.B., Lee, T.I., Young, R.A.: Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153(2), 307–319 (2013)
Corradin, O., Saiakhova, A., Akhtar-Zaidi, B., Myeroff, L., Willis, J., Cowper-Sallari, R., Lupien, M., Markowitz, S., Scacheri, P.C.: Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 24(1), 1–13 (2014)
Eskin, E.: Discovering genes involved in disease and the mystery of missing heritability. Commun. ACM 58(10), 80–87 (2015)
Glusman, G., Cox, H.C., Roach, J.C.: Whole-genome haplotyping approaches and genomic medicine. Genome Med. 6(9), 73 (2014)
Browning, S.R., Browning, B.L.: Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12(10), 703–714 (2011)
Browning, S.R., Browning, B.L.: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81(5), 1084–1097 (2007)
Delaneau, O., Marchini, J., Zagury, J.F.: A linear complexity phasing method for thousands of genomes. Nat. Meth. 9(2), 179–181 (2012)
Delaneau, O., Zagury, J.F., Marchini, J.: Improved whole-chromosome phasing for disease and population genetic studies. Nat. Meth. 10(1), 5–6 (2013)
O’Connell, J., Sharp, K., Shrine, N., Wain, L., Hall, I., Tobin, M., Zagury, J.F., Delaneau, O., Marchini, J.: Haplotype estimation for biobank-scale data sets. Nat. Genet. 48(7), 817–820 (2016)
Loh, P.R., Palamara, P.F., Price, A.L.: Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48(7), 811–816 (2016)
Loh, P.R., Danecek, P., Palamara, P.F., Fuchsberger, C., Reshef, Y.A., Finucane, H.K., Schoenherr, S., Forer, L., McCarthy, S., Abecasis, G.R., Durbin, R., Price, A.L.: Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48(11), 1443–1448 (2016)
The 1000 Genomes Project Consortium: A global reference for human genetic variation. Nature 526(7571), 68–74 (2015)
The Genome of the Netherlands Consortium: Whole-genome sequence variation, population structure and demographic history of the dutch population. Nat. Genet. 46, 818–825 (2014)
Hehir-Kwa, J.Y., Marschall, T., Kloosterman, W.P., et al.: A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Nat. Commun. 7, 12989 (2016)
Rastas, P., Ukkonen, E.: Haplotype inference via hierarchical genotype parsing. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS, vol. 4645, pp. 85–97. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74126-8_9
Abecasis, G.R., Cherny, S.S., Cookson, W.O., Cardon, L.R.: Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30(1), 97–101 (2002)
Roach, J.C., Glusman, G., Hubley, R., Montsaroff, S.Z., Holloway, A.K., Mauldin, D.E., Srivastava, D., Garg, V., Pollard, K.S., Galas, D.J., Hood, L., Smit, A.F.A.: Chromosomal haplotypes by genetic phasing of human families. Am. J. Hum. Genet. 89(3), 382–397 (2011)
Williams, A.L., Housman, D.E., Rinard, M.C., Gifford, D.K.: Rapid haplotype inference for nuclear families. Genome Biol. 11, R108 (2010)
Chin, C.S., Peluso, P., Sedlazeck, F.J., Nattestad, M., Concepcion, G.T., Clum, A., Dunn, C., O’Malley, R., Figueroa-Balderas, R., Morales-Cruz, A., Cramer, G.R., Delledonne, M., Luo, C., Ecker, J.R., Cantu, D., Rank, D.R., Schatz, M.C.: Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Meth. 13(12), 1050–1054 (2016). Advance online publication
Weisenfeld, N.I., Kumar, V., Shah, P., Church, D., Jae, D.B.: Direct determination of diploid genome sequences. bioRxiv, 070425 (2016)
Snyder, M.W., Adey, A., Kitzman, J.O., Shendure, J.: Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet. 16(6), 344–358 (2015)
Marchini, J., Cutler, D., Patterson, N., Stephens, M., Eskin, E., Halperin, E., Lin, S., Qin, Z.S., Munro, H.M., Abecasis, G.R., Donnelly, P.: A comparison of phasing algorithms for trios and unrelated individuals. Am. J. Hum. Genet. 78(3), 437–450 (2006)
Chen, W., Li, B., Zeng, Z., Sanna, S., Sidore, C., Busonero, F., Kang, H.M., Li, Y., Abecasis, G.R.: Genotype calling and haplotyping in parent-offspring trios. Genome Res. 23(1), 142–151 (2013)
Delaneau, O., Howie, B., Cox, A.J., Zagury, J.F., Marchini, J.: Haplotype estimation using sequencing reads. Am. J. Hum. Genet. 93(4), 687–696 (2013)
Garg, S., Martin, M., Marschall, T.: Read-based phasing of related individuals. Bioinformatics (Oxford, England) 32(12), i234–i242 (2016)
Lippert, R., Schwartz, R., Lancia, G., Istrail, S.: Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Briefings Bioinform. 3(1), 23–31 (2002)
Cilibrasi, R., Iersel, L., Kelk, S., Tromp, J.: On the complexity of several haplotyping problems. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS, vol. 3692, pp. 128–139. Springer, Heidelberg (2005). doi:10.1007/11557067_11
Zhao, Y.Y., Wu, L.Y., Zhang, J.H., Wang, R.S., Zhang, X.S.: Haplotype assembly from aligned weighted SNP fragments. Comput. Biol. Chem. 29(4), 281–287 (2005)
Bonizzoni, P., Dondi, R., Klau, G.W., Pirola, Y., Pisanti, N., Zaccaria, S.: On the minimum error correction problem for haplotype assembly in diploid and polyploid genomes. J. Comput. Biol. 23(9), 718–736 (2016). A journal of computational molecular cell biology
Hanscom, C., Talkowski, M.: Design of large-insert jumping libraries for structural variant detection using illumina sequencing. Curr. Protoc. Hum. Genet. 80, 7.22.1–7.22.9 (2014)
Zheng, G.X.Y., Lau, B.T., Schnall-Levin, M., et al.: Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34(3), 303–311 (2016)
Chaisson, M.J.P., Huddleston, J., Dennis, M.Y., Sudmant, P.H., Malig, M., Hormozdiari, F., Antonacci, F., Surti, U., Sandstrom, R., Boitano, M., Landolin, J.M., Stamatoyannopoulos, J.A., Hunkapiller, M.W., Korlach, J., Eichler, E.E.: Resolving the complexity of the human genome using single-molecule sequencing. Nature 517(7536), 608–611 (2015)
Porubský, D., Sanders, A.D., van Wietmarschen, N., Falconer, E., Hills, M., Spierings, D.C.J., Bevova, M.R., Guryev, V., Lansdorp, P.M.: Direct chromosome-length haplotyping by single-cell sequencing. Genome Res. 26(11), 1565–1574 (2016)
Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., Sandstrom, R., Bernstein, B., Bender, M.A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L.A., Lander, E.S., Dekker, J.: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950), 289–293 (2009)
Rhee, J.K., Li, H., Joung, J.G., Hwang, K.B., Zhang, B.T., Shin, S.Y.: Survey of computational haplotype determination methods for single individual. Genes Genomics 38(1), 1–12 (2015)
He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., Eskin, E.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26(12), i183–i190 (2010)
Deng, F., Cui, W., Wang, L.: A highly accurate heuristic algorithm for the haplotype assembly problem. BMC Genom. 14(Suppl 2), S2 (2013)
Patterson, M., Marschall, T., Pisanti, N., Iersel, L., Stougie, L., Klau, G.W., Schönhuth, A.: WhatsHap: haplotype assembly for future-generation sequencing reads. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 237–249. Springer, Cham (2014). doi:10.1007/978-3-319-05269-4_19
Patterson, M., Marschall, T., Pisanti, N., van Iersel, L., Stougie, L., Klau, G.W., Schönhuth, A.: WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 22(6), 498–509 (2015)
Kuleshov, V.: Probabilistic single-individual haplotyping. Bioinformatics (Oxford, England) 30(17), i379–i385 (2014)
Pirola, Y., Zaccaria, S., Dondi, R., Klau, G.W., Pisanti, N., Bonizzoni, P.: HapCol: accurate and memory-efficient haplotype assembly from long reads. Bioinformatics 32(11), 1610–1617 (2015)
Fouilhoux, P., Mahjoub, A.R.: Solving VLSI design and DNA sequencing problems using bipartization of graphs. Comput. Optim. Appl. 51(2), 749–781 (2012)
Chen, Z.Z., Deng, F., Wang, L.: Exact algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics (Oxford, England) 29(16), 1938–1945 (2013)
Chen, Z.Z., Deng, F., Shen, C., Wang, Y., Wang, L.: Better ILP-based approaches to haplotype assembly. J. Comput. Biol. 23(7), 537–552 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Klau, G.W., Marschall, T. (2017). A Guided Tour to Computational Haplotyping. In: Kari, J., Manea, F., Petre, I. (eds) Unveiling Dynamics and Complexity. CiE 2017. Lecture Notes in Computer Science(), vol 10307. Springer, Cham. https://doi.org/10.1007/978-3-319-58741-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-58741-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58740-0
Online ISBN: 978-3-319-58741-7
eBook Packages: Computer ScienceComputer Science (R0)