Haplotype and Repeat Separation in Long Reads

Tischler-Höhle, German

doi:10.1007/978-3-030-14160-8_11

German Tischler-Höhle¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10834))

Included in the following conference series:

International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics

535 Accesses
1 Citations

Abstract

Resolving the correct structure and succession of highly similar sequence stretches is one of the main open problems in genome assembly. For non haploid genomes this includes determining the sequences of the different haplotypes. For all but the smallest genomes it also involves separating different repeat instances. In this paper we discuss methods for resolving such problems in third generation long reads by classifying alignments between long reads according to whether they represent true or false read overlaps. The main problem in this context is the high error rate found in such reads, which greatly exceeds the variance between the similar regions we want to separate. Our methods can separate read classes stemming from regions with as little as $1\%$ difference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

High-fidelity (repeat) consensus sequences from short reads using combined read clustering and assembly

Article Open access 24 January 2024

Resolving repeat families with long reads

Article Open access 09 May 2019

MosaicFlye: Resolving Long Mosaic Repeats Using Long Reads

References

Myers, G.: Efficient local alignment discovery amongst noisy long reads. In: Brown, D., Morgenstern, B. (eds.) WABI 2014. LNCS, vol. 8701, pp. 52–67. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44753-6_5
Chapter Google Scholar
Tischler, G., Myers, E.W.: Non hybrid long read consensus using local de bruijn graph assembly. bioRxiv (2017). https://www.biorxiv.org/content/early/2017/02/06/106252
Patterson, M., Marschall, T., Pisanti, N., van Iersel, L., Stougie, L., Klau, G.W., Schönhuth, A.: WhatsHap: haplotype assembly for future-generation sequencing reads. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 237–249. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05269-4_19
Chapter Google Scholar
Murray, P., et al.: Whatshap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 22(6), 498–509 (2015). https://doi.org/10.1089/cmb.2014.0157. pMID: 25658651
Article Google Scholar
Martin, M., et al.: Whatshap: fast and accurate read-based phasing. bioRxiv (2016). https://www.biorxiv.org/content/early/2016/11/14/085050
Bansal, V., Halpern, A.L., Axelrod, N., Bafna, V.: An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Res. 18(8), 1336–1346 (2008). http://genome.cshlp.org/content/18/8/1336.abstract
Article Google Scholar
Bansal, V., Bafna, V.: Hapcut: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153–i159 (2008). https://doi.org/10.1093/bioinformatics/btn298
Article Google Scholar
Mazrouee, S., Wang, W.: Fasthap: fast and accurate single individual haplotype reconstruction using fuzzy conflict graphs. Bioinformatics 30(17), i371–i378 (2014). btu442[PII], http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4147895/
Article Google Scholar
Deng, F., Cui, W., Wang, L.: A highly accurate heuristic algorithm for the haplotype assembly problem. BMC Genomics 14(2), S2 (2013). https://doi.org/10.1186/1471-2164-14-S2-S2
Article Google Scholar
Chin, C.S., et al.: Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050 EP (2016). https://doi.org/10.1038/nmeth.4035. article
Article Google Scholar
Chaisson, M.J., Mukherjee, S., Kannan, S., Eichler, E.E.: Resolving multicopy duplications de novo using polyploid phasing. In: Sahinalp, S.C. (ed.) RECOMB 2017. LNCS, vol. 10229, pp. 117–133. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56970-3_8
Chapter Google Scholar
Koren, S., Walenz, B.P., Berlin, K., Miller, J.R., Bergman, N.H., Phillippy, A.M.: Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27(5), 722–736 (2017). http://genome.cshlp.org/content/27/5/722.abstract
Article Google Scholar
Carneiro, M.O., Russ, C., Ross, M.G., Gabriel, S.B., Nusbaum, C., DePristo, M.A.: Pacific biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13(1), 375 (2012). https://doi.org/10.1186/1471-2164-13-375
Article Google Scholar
Escalona, M., Rocha, S., Posada, D.: A comparison of tools for the simulation of genomic next-generation sequencing data. Nat. Rev. Genet. 17(8), 459–469 (2016). 27320129[pmid]. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224698/
Article Google Scholar
Ono, Y., Asai, K., Hamada, M.: Pbsim: Pacbio reads simulator-toward accurate genome assembly. Bioinformatics 29(1), 119–121 (2013). https://doi.org/10.1093/bioinformatics/bts649
Article Google Scholar
Garrison, E., Marth, G.: Haplotype-based variant detection from short-read sequencing. ArXiv e-prints (2012)
Google Scholar

Download references

Acknowledgments

We thank Gene Myers for interesting algorithmical discussions related to this paper and Shilpa Garg for advice on running WhatsHap.

Author information

Authors and Affiliations

Myers Lab, Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, Dresden, Germany
German Tischler-Höhle

Authors

German Tischler-Höhle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to German Tischler-Höhle .

Editor information

Editors and Affiliations

University of Cagliari, Cagliari, Italy
Massimo Bartoletti
University of Genova, Genoa, Italy
Annalisa Barla
University of Stirling, Stirling, UK
Andrea Bracciali
Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
Gunnar W. Klau
Houston Methodist Research Institute, Houston, TX, USA
Leif Peterson
University of Udine, Udine, Italy
Alberto Policriti
University of Salerno, Fisciano, Italy
Roberto Tagliaferri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tischler-Höhle, G. (2019). Haplotype and Repeat Separation in Long Reads. In: Bartoletti, M., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2017. Lecture Notes in Computer Science(), vol 10834. Springer, Cham. https://doi.org/10.1007/978-3-030-14160-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-14160-8_11
Published: 14 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14159-2
Online ISBN: 978-3-030-14160-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics