Skip to main content

mapAlign: An Efficient Approach for Mapping and Aligning Long Reads to Reference Genomes

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2020)

Abstract

Long reads play an important role for the identification of structural variants, sequencing repetitive regions, phasing of alleles, etc. In this paper, we propose a new approach for mapping long reads to reference genomes. We also propose a new method to generate accurate alignments of the long reads and the corresponding segments of reference genome. The new mapping algorithm is based on the longest common sub-sequence with distance constraints. The new (local) alignment algorithms is based on the idea of recursive alignment of variable size k-mers. Experiments show that our new method can generate better alignments in terms of both identity and alignment scores for both Nanopore and SMRT data sets. In particular, our method can align 91.53% and \(85.36\%\) of letters on reads to identical letters on reference genomes for human individuals of Nanopore and SMRT data sets, respectively. The state-of-the-art method can only align \(88.44\%\) and \(79.08\%\) letters of reads for Nanopore and SMRT data sets, respectively. Our method is also faster than the state-of-the-art method.

Availability: https://github.com/yw575/mapAlign

Supported by GRF grants [Project Number CityU 11256116 and CityU 11210119] from Hong Kong SAR government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–60 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Chaisson, M.J., Tesler, G.: Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinform. 13(1), 238 (2012)

    Article  CAS  Google Scholar 

  3. Sovic, I., Sikic, M., Wilm, A., Fenlon, S.N., Chen, S., Nagarajan, N.: Fast and sensitive mapping of Nanopore sequencing reads with GraphMap. Nat. commun. 7, 11307 (2016)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Li, H.: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32(14), 2103–2110 (2016)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Li, H.: Minimap2: fast pairwise alignment for long nucleotide sequences. Bioinformatics 34(18), 3094–3100 (2017). arXiv:1708.01492

    Article  Google Scholar 

  6. Jain, C., Dilthey, A., Koren, S., Aluru, S., Phillippy, A.M.: A fast approximate algorithm for mapping long reads to large reference databases. In: Recomb 2017, pp. 66–81 (2017)

    Google Scholar 

  7. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–10 (1990)

    Article  CAS  PubMed  Google Scholar 

  8. Kent, W.J.: BLAT-the BLAST-like alignment tool. Genome Res. 12, 656–64 (2002)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Li, R., Li, Y., Kristiansen, K., Wang, J.: SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–4 (2008)

    Article  CAS  PubMed  Google Scholar 

  10. Jiang, H., Wong, W.H.: SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics 24, 2395–6 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Hach, F., et al.: mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7, 576–7 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Campagna, D., et al.: PASS: a program to align short sequences. Bioinformatics 25, 967–8 (2009)

    Article  CAS  PubMed  Google Scholar 

  13. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–5 (2002)

    Article  CAS  PubMed  Google Scholar 

  14. McKernan, K.J., et al.: Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–41 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Homer, N., Merriman, B., Nelson, S.F.: BFAST: an alignment tool for large scale genome resequencing. PLoS ONE 4, e7767 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  16. Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–8 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Rumble, S.M., et al.: SHRiMP: accurate mapping of short color-space reads. PLoS Comput. Biol. 5, e1000386 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  18. Weese, D., Emde, A.-K., Rausch, T., Döring, A., Reinert, K.: RazerS-fast read mapping with sensitivity control. Genome Res. 19, 1646–54 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Digital Equipment Corporation 124 (1994)

    Google Scholar 

  20. Ning, Z., Cox, A.J., Mullikin, J.C.: SSAHA: a fast search method for large DNA databases. Genome Res. 11, 1725–9 (2010)

    Article  Google Scholar 

  21. Galinsky, V.L.: YOABS: yet other aligner of biological sequences-an efficient linearly scaling nucleotide aligner. Bioinformatics 28, 1070–7 (2012)

    Article  CAS  PubMed  Google Scholar 

  22. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  23. Liu, C.-M., et al.: SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28, 878–9 (2012)

    Article  CAS  PubMed  Google Scholar 

  24. Klus, P., et al.: BarraCUDA - a fast short read sequence aligner using graphics processing units. BMC Res. Notes 5, 27 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Liu, Y., Schmidt, B., Maskell, D.L.: CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform. Bioinformatics 28, 1830–1837 (2012)

    Article  CAS  PubMed  Google Scholar 

  26. Jain, M., et al.: Nanopore sequencing and assembly of a human genome with ultra-long reads. bioRxiv, 128835 (2017)

    Google Scholar 

  27. Ono, Y., et al.: PBSIM: pacBio reads simulator-toward accurate genome assembly. Bioinformatics 29, 119–121 (2013)

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work is supported by a GRF grant for Hong Kong Special Administrative Region, China (CityU 11256116) and a grant from the National Science Foundation of China (NSFC 61972329).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lusheng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, W., Wang, L. (2020). mapAlign: An Efficient Approach for Mapping and Aligning Long Reads to Reference Genomes. In: Cai, Z., Mandoiu, I., Narasimhan, G., Skums, P., Guo, X. (eds) Bioinformatics Research and Applications. ISBRA 2020. Lecture Notes in Computer Science(), vol 12304. Springer, Cham. https://doi.org/10.1007/978-3-030-57821-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-57821-3_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57820-6

  • Online ISBN: 978-3-030-57821-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics