Skip to main content

RNACache: Fast Mapping of RNA-Seq Reads to Transcriptomes Using MinHashing

  • Conference paper
  • First Online:
  • 1589 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12742))

Abstract

The alignment of reads to a transcriptome is an important initial step in a variety of bioinformatics RNA-seq pipelines. As traditional alignment-based tools suffer from high runtimes, alternative, alignment-free methods have recently gained increasing importance. We present a novel approach to the detection of local similarities between transcriptomes and RNA-seq reads based on context-aware minhashing. We introduce RNACache, a three-step processing pipeline consisting of minhashing of k-mers, match-based (online) filtering, and coverage-based filtering in order to identify truly expressed transcript isoforms. Our performance evaluation shows that RNACache produces transcriptomic mappings of high accuracy that include significantly fewer erroneous matches compared to the state-of-the-art tools RapMap, Salmon, and Kallisto. Furthermore, it offers scalable and highly competitive runtime performance at low memory consumption on common multi-core workstations. RNACache is publicly available at: https://github.com/jcasc/rnacache.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Berlin, K., Koren, S., Chin, C.S., et al.: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotech. 33, 623–630 (2015)

    Article  Google Scholar 

  2. Bray, N.L., Pimentel, H., Melsted, P., Pachter, L.: Near-optimal probabilistic RNA-seq quantification. Nat. Biotech. 34(5), 525–527 (2016)

    Article  Google Scholar 

  3. Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171), pp. 21–29 (1997)

    Google Scholar 

  4. Broder, A.Z.: Identifying and filtering near-duplicate documents. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 1–10. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  5. Dobin, A., et al.: Star: ultrafast universal RNA-seq aligner. Bioinformatics 29(1), 15–21 (2013)

    Article  Google Scholar 

  6. Garber, M., Grabherr, M.G., Guttman, M., Trapnell, C.: Computational methods for transcriptome annotation and quantification using RNA-seq. Nat. Methods 8(6), 469–477 (2011)

    Article  Google Scholar 

  7. Griebel, T., et al.: Modelling and simulating generic RNA-seq experiments with the flux simulator. Nucleic Acids Res. 40(20), 10073–10083 (2012)

    Article  Google Scholar 

  8. Kobus, R., et al.: A big data approach to metagenomics for all-food-sequencing. BMC Bioinformatics 21(1), 1–15 (2020)

    Article  Google Scholar 

  9. Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with bowtie 2. Nat. Methods 9(4), 357–359 (2012)

    Article  Google Scholar 

  10. Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Data Sets. Cambridge University Press, Cambridge (2020)

    Book  Google Scholar 

  11. Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM (2013)

    Google Scholar 

  12. Li, H., et al.: The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)

    Article  Google Scholar 

  13. Müller, A., Hundt, C., Hildebrandt, A., et al.: Metacache: context-aware classification of metagenomic reads using minhashing. Bioinformatics 33(23), 3740–3748 (2017)

    Article  Google Scholar 

  14. Nellore, A., et al.: Rail-RNA: scalable analysis of RNA-seq splicing and coverage. Bioinformatics 33(24), 4033–4040 (2017)

    Google Scholar 

  15. Niebler, S., Müller, A., Hankeln, T., Schmidt, B.: Raindrop: rapid activation matrix computation for droplet-based single-cell RNA-seq reads. BMC Bioinformatics 21(1), 1–14 (2020)

    Article  Google Scholar 

  16. Ondov, B.D., Treangen, T.J., Melsted, P., et al.: Mash: fast genome and metagenome distance estimation using minhash. Genome Biol. 17, 132 (2016)

    Article  Google Scholar 

  17. Patro, R., Duggal, G., Love, M.I., Irizarry, R.A., Kingsford, C.: Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14(4), 417–419 (2017)

    Article  Google Scholar 

  18. Patro, R., Mount, S.M., Kingsford, C.: Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat. Biotechnol. 32(5), 462–464 (2014)

    Article  Google Scholar 

  19. Sarkar, H., Zakeri, M., Malik, L., Patro, R.: Towards selective-alignment: bridging the accuracy gap between alignment-based and alignment-free transcript quantification. In: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 27–36. BCB 2018. ACM (2018)

    Google Scholar 

  20. Schmidt, B., Hildebrandt, A.: Next-generation sequencing: big data meets high performance computing. Drug Discovery Today 22(4), 712–717 (2017)

    Article  Google Scholar 

  21. Srivastava, A., Sarkar, H., Gupta, N., Patro, R.: RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics 32(12), i192–i200 (2016)

    Article  Google Scholar 

  22. Stephens, Z.D., et al.: Big data: astronomical or genomical? PLoS Biol. 13(7), e1002195 (2015)

    Article  Google Scholar 

  23. Wang, Z., Gerstein, M., Snyder, M.: RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian Cascitti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cascitti, J., Niebler, S., Müller, A., Schmidt, B. (2021). RNACache: Fast Mapping of RNA-Seq Reads to Transcriptomes Using MinHashing. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12742. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77961-0_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77960-3

  • Online ISBN: 978-3-030-77961-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics