Abstract
Positional Burrows-Wheeler Transform (PBWT) is a data structure that supports efficient algorithms for finding matching segments in a panel of haplotypes. It is of interest to study the composite patterns of multiple matching segments or blocks arranged contiguously along a same haplotype as they can indicate recombination crossover events, gene-conversion tracts, or, sometimes, errors of phasing algorithms. However, current PBWT algorithms do not support search of such composite patterns efficiently. Here, we present our algorithm, mcPBWT (multi-column PBWT), that uses multiple synchronized runs of PBWT at different variant sites providing a “look-ahead" information of matches at those variant sites. Such “look-ahead” information allows us to analyze multiple contiguous matching pairs in a single pass. We present two specific cases of mcPBWT, namely double-PBWT and triple-PBWT which utilize two and three columns of PBWT respectively. double-PBWT finds two matching pairs’ combinations representative of crossover event or phasing error while triple-PBWT finds three matching pairs’ combinations representative of gene-conversion tract.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alanko, J., Bannai, H., Cazaux, B., Peterlongo, P., Stoye, J.: Finding all maximal perfect haplotype blocks in linear time. Algorithms Mol. Biol. 15(1), 1–7 (2020)
Cunha, L., Diekmann, Y., Kowada, L., Stoye, J.: Identifying maximal perfect haplotype blocks. In: Alves, R. (ed.) BSB 2018. LNCS, vol. 11228, pp. 26–37. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01722-4_3
Delaneau, O., Zagury, J.-F., Robinson, M., Marchini, J., Dermitzakis, E.: Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10(1), 1–10 (2019)
Durbin, R.: Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT). Bioinformatics 30(9), 1266–1272 (2014)
Freyman, W., et al.: Fast and robust identity-by-descent inference with the templated positional Burrows-Wheeler transform. Mol. Biol. Evol. 38(5), 2131–2151 (2021)
Loh, P.-R., et al.: Reference-based phasing using the haplotype reference consortium panel. Nat. Genet. 48(11), 1443 (2016)
Naseri, A., Liu, X., Tang, K., Zhang, S., Zhi, D.: RaPID: ultra-fast, powerful, and accurate detection of segments identical by descent (IBD) in biobank-scale cohorts. Genome Biol. 20(1), 143 (2019)
Naseri, A., Yue, W., Zhang, S., Zhi, D.: Efficient haplotype block matching in bi-directional PBWT. In: Carbone, A., El-Kebir, M. (eds.) 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, vol. 201, pp. 19:1–19:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)
Naseri, A., Zhi, D., Zhang, S.: Multi-allelic positional Burrows-Wheeler transform. BMC Bioinform. 20(11), 1–8 (2019)
Naseri, A., Zhi, D., Zhang, S.: Discovery of runs-of-homozygosity diplotype clusters and their associations with diseases in UK biobank. medRxiv (2020). https://doi.org/10.1101/2020.10.26.20220004
Novak, A., Garrison, E., Paten, B.: A graph extension of the positional Burrows-Wheeler transform and its applications. Algorithms Mol. Biol. 12(1), 1–12 (2017)
Rubinacci, S., Delaneau, O., Marchini, J.: Genotype imputation using the positional Burrows Wheeler transform. PLoS Genet. 16(11), e1009049 (2020)
Sanaullah, A., Zhi, D., Zhang, S.: d-PBWT: dynamic positional Burrows-Wheeler transform. Bioinformatics 37(16), 2390–2397 (2021)
Thompson, E.: Identity by descent: variation in meiosis, across genomes, and in populations. Genetics 194(2), 301–326 (2013)
Williams, L., Mumey, B.: Maximal perfect haplotype blocks with wildcards. iScience 23(6), 101149 (2020)
Zhou, Y., Browning, S.R., Browning, B.L.: A fast and simple method for detecting identity-by-descent segments in large-scale data. Am. J. Hum. Genet. 106(4), 426–437 (2020)
Acknowledgments
PS, AN, DZ and SZ were supported by the National Institutes of Health grant R01 HG010086. AN, DZ and SZ were also supported by the National Institutes of Health grants R56 HG011509. AN and DZ were also supported by the National Institutes of Health grant OT2-OD002751.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shakya, P., Naseri, A., Zhi, D., Zhang, S. (2022). mcPBWT: Space-Efficient Multi-column PBWT Scanning Algorithm for Composite Haplotype Matching. In: Bansal, M.S., et al. Computational Advances in Bio and Medical Sciences. ICCABS 2021. Lecture Notes in Computer Science(), vol 13254. Springer, Cham. https://doi.org/10.1007/978-3-031-17531-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-17531-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17530-5
Online ISBN: 978-3-031-17531-2
eBook Packages: Computer ScienceComputer Science (R0)