Skip to main content

mcPBWT: Space-Efficient Multi-column PBWT Scanning Algorithm for Composite Haplotype Matching

  • Conference paper
  • First Online:
Computational Advances in Bio and Medical Sciences (ICCABS 2021)

Abstract

Positional Burrows-Wheeler Transform (PBWT) is a data structure that supports efficient algorithms for finding matching segments in a panel of haplotypes. It is of interest to study the composite patterns of multiple matching segments or blocks arranged contiguously along a same haplotype as they can indicate recombination crossover events, gene-conversion tracts, or, sometimes, errors of phasing algorithms. However, current PBWT algorithms do not support search of such composite patterns efficiently. Here, we present our algorithm, mcPBWT (multi-column PBWT), that uses multiple synchronized runs of PBWT at different variant sites providing a “look-ahead" information of matches at those variant sites. Such “look-ahead” information allows us to analyze multiple contiguous matching pairs in a single pass. We present two specific cases of mcPBWT, namely double-PBWT and triple-PBWT which utilize two and three columns of PBWT respectively. double-PBWT finds two matching pairs’ combinations representative of crossover event or phasing error while triple-PBWT finds three matching pairs’ combinations representative of gene-conversion tract.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alanko, J., Bannai, H., Cazaux, B., Peterlongo, P., Stoye, J.: Finding all maximal perfect haplotype blocks in linear time. Algorithms Mol. Biol. 15(1), 1–7 (2020)

    Article  Google Scholar 

  2. Cunha, L., Diekmann, Y., Kowada, L., Stoye, J.: Identifying maximal perfect haplotype blocks. In: Alves, R. (ed.) BSB 2018. LNCS, vol. 11228, pp. 26–37. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01722-4_3

    Chapter  Google Scholar 

  3. Delaneau, O., Zagury, J.-F., Robinson, M., Marchini, J., Dermitzakis, E.: Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10(1), 1–10 (2019)

    Article  Google Scholar 

  4. Durbin, R.: Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT). Bioinformatics 30(9), 1266–1272 (2014)

    Article  Google Scholar 

  5. Freyman, W., et al.: Fast and robust identity-by-descent inference with the templated positional Burrows-Wheeler transform. Mol. Biol. Evol. 38(5), 2131–2151 (2021)

    Article  Google Scholar 

  6. Loh, P.-R., et al.: Reference-based phasing using the haplotype reference consortium panel. Nat. Genet. 48(11), 1443 (2016)

    Article  Google Scholar 

  7. Naseri, A., Liu, X., Tang, K., Zhang, S., Zhi, D.: RaPID: ultra-fast, powerful, and accurate detection of segments identical by descent (IBD) in biobank-scale cohorts. Genome Biol. 20(1), 143 (2019)

    Article  Google Scholar 

  8. Naseri, A., Yue, W., Zhang, S., Zhi, D.: Efficient haplotype block matching in bi-directional PBWT. In: Carbone, A., El-Kebir, M. (eds.) 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, vol. 201, pp. 19:1–19:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

    Google Scholar 

  9. Naseri, A., Zhi, D., Zhang, S.: Multi-allelic positional Burrows-Wheeler transform. BMC Bioinform. 20(11), 1–8 (2019)

    Google Scholar 

  10. Naseri, A., Zhi, D., Zhang, S.: Discovery of runs-of-homozygosity diplotype clusters and their associations with diseases in UK biobank. medRxiv (2020). https://doi.org/10.1101/2020.10.26.20220004

  11. Novak, A., Garrison, E., Paten, B.: A graph extension of the positional Burrows-Wheeler transform and its applications. Algorithms Mol. Biol. 12(1), 1–12 (2017)

    Article  Google Scholar 

  12. Rubinacci, S., Delaneau, O., Marchini, J.: Genotype imputation using the positional Burrows Wheeler transform. PLoS Genet. 16(11), e1009049 (2020)

    Google Scholar 

  13. Sanaullah, A., Zhi, D., Zhang, S.: d-PBWT: dynamic positional Burrows-Wheeler transform. Bioinformatics 37(16), 2390–2397 (2021)

    Article  Google Scholar 

  14. Thompson, E.: Identity by descent: variation in meiosis, across genomes, and in populations. Genetics 194(2), 301–326 (2013)

    Article  Google Scholar 

  15. Williams, L., Mumey, B.: Maximal perfect haplotype blocks with wildcards. iScience 23(6), 101149 (2020)

    Google Scholar 

  16. Zhou, Y., Browning, S.R., Browning, B.L.: A fast and simple method for detecting identity-by-descent segments in large-scale data. Am. J. Hum. Genet. 106(4), 426–437 (2020)

    Google Scholar 

Download references

Acknowledgments

PS, AN, DZ and SZ were supported by the National Institutes of Health grant R01 HG010086. AN, DZ and SZ were also supported by the National Institutes of Health grants R56 HG011509. AN and DZ were also supported by the National Institutes of Health grant OT2-OD002751.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaojie Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shakya, P., Naseri, A., Zhi, D., Zhang, S. (2022). mcPBWT: Space-Efficient Multi-column PBWT Scanning Algorithm for Composite Haplotype Matching. In: Bansal, M.S., et al. Computational Advances in Bio and Medical Sciences. ICCABS 2021. Lecture Notes in Computer Science(), vol 13254. Springer, Cham. https://doi.org/10.1007/978-3-031-17531-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17531-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17530-5

  • Online ISBN: 978-3-031-17531-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics