Skip to main content

Large-Scale Parallel Alignment Algorithm for SMRT Reads

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13156))

Abstract

Single Molecule Real-Time (SMRT) sequencing is one of the popular issues in third-generation sequencing technology. Compared with next-generation sequencing technology, SMRT can detect single molecules and has much longer read lengths, which also leads to a huge increase in the amount of data. As the performance of a single CPU has reached its bottleneck, single-node computing is far from meeting the SMRT sequencing requirements. An alternative solution is parallel computing. It makes the alignment algorithm run on multiple computing nodes, thus greatly decreases the running time. The Regional Hashing-based Alignment Tool (rHAT) is a novel approach developed especially for SMRT sequencing. It has better sensitivity, improved correctness compared with existing sequence alignment tools. However, the original rHAT source can only run on a single node, which dramatically limits its performance. In this article, we developed PrHAT, a parallel sequence alignment version of rHAT. We test PrHAT on simulated and real datasets which the original rHAT used. Our results show that PrHAT reduces the computing wall-time from nearly an hour to several minutes. In the process of increasing the number of nodes from 2 to 16 on aligning large-scale datasets, PrHAT achieves speedups of 1.94–14.87x. The parallel efficiency decreases from 97% to 93%; moreover, its weak scaling remains almost unchanged. Based on PrHAT, we developed OpenPrHAT. It has a similar performance towards PrHAT, but can run on other computing devices like GPU in the platform. We expect that the implementation of PrHAT will promote the development of SMRT in third-generation sequencing technology.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sanger, F., Coulson, A.R., Barrell, B., Smith, A., Roe, B.: Cloning in single-stranded bacteriophage as an aid to rapid DNA sequencing. J. Mol. Biol. 143(2), 161–178 (1980)

    Article  Google Scholar 

  2. Roberts, R.J., Carneiro, M.O., Schatz, M.C.: The advantages of SMRT sequencing. Genome Biol. 14(6), 405 (2013)

    Article  Google Scholar 

  3. Mary, Q., Yang, B., Athey, H., Arabnia, A.: High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics. BMC Genomics 10(Suppl 1), 11 (2009)

    Article  Google Scholar 

  4. Korlach, J., Bjornson, K.P., Chaudhuri, B.P., Cicero, R.L., Turner, S.W.: Real-time DNA sequencing from single polymerase molecules. Methods Enzymol. 472, 431–455 (2010)

    Article  Google Scholar 

  5. Carneiro, M.O., Russ, C., Ross, M.G., Gabriel, S.B., Nusbaum, C., Depristo, M.A.: Pacific biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13, 375 (2012)

    Article  Google Scholar 

  6. Bo, L., Dengfeng, G., Mingxiang, T., Yadong, W.: rHAT: fast alignment of noisy long reads with regional hashing. Bioinformatics 32(11), 1625–1631 (2015)

    Google Scholar 

  7. Li, H., Durbin, R.: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26(5), 589–595 (2010)

    Article  Google Scholar 

  8. Chaisson, M.J., Tesler, G.: Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinf. 13(1), 238 (2012)

    Article  Google Scholar 

  9. Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM (2013). arXiv preprint arXiv:13033997

  10. Peter, J.A.C., Christopher, J.: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variant. Nuclelic Acids Res. 38(6), 1767–1771 (2010)

    Article  Google Scholar 

  11. Peters, D., Luo, X., Qiu, K., Liang, P.: Speeding up large-scale next generation sequencing data analysis with pBWA. J Biocomput 1(2), 1–6 (2012)

    Google Scholar 

  12. Brawer, S.: Preface - an introduction to parallel programming. Introduction Parallel Program. 5(4), 361–370 (2011)

    Google Scholar 

  13. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  14. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)

    Article  Google Scholar 

  15. Kucherov, G.: Evolution of biosequence search algorithms: a brief survey. Bioinformatics 35(19), 3547–3552 (2019)

    Article  Google Scholar 

  16. Xing, Y., Wu, C., Yang, X., Wang, W., Yin, J.: ParaBTM: a parallel processing framework for biomedical text mining on supercomputers. Molecules 23(5), 1028 (2018)

    Article  Google Scholar 

  17. Patterson, D.A., Hennessy, J.L., Goldberg, D.: Computer Architecture: A Quantitative Approach, vol. 2. Morgan Kaufmann, San Mateo, CA (1990)

    Google Scholar 

  18. Bondi, A.B.: Characteristics of scalability and their impact on performance. In: Proceedings of the 2nd International Workshop on Software and Performance, pp. 195–203 (2000)

    Google Scholar 

Download references

Acknowledgments

This work was supported by National Key R&D Program of China 2020YFA0709803, 2018YFB0204301 and NSFC Grants 62102427. The funding bodies did not influence the design of the study, data collection, analysis, or interpretation, or writing of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingbo Cui .

Editor information

Editors and Affiliations

Appendix

Appendix

All source codes of PrHAT can be found on:

https://drive.google.com/drive/folders/1OLjYANWXHz6b22sfdf7Mqv6vm1zilB69?usp=sharing.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xia, Z. et al. (2022). Large-Scale Parallel Alignment Algorithm for SMRT Reads. In: Lai, Y., Wang, T., Jiang, M., Xu, G., Liang, W., Castiglione, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2021. Lecture Notes in Computer Science(), vol 13156. Springer, Cham. https://doi.org/10.1007/978-3-030-95388-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95388-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95387-4

  • Online ISBN: 978-3-030-95388-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics