RNACache: Fast Mapping of RNA-Seq Reads to Transcriptomes Using MinHashing

Cascitti, Julian; Niebler, Stefan; Müller, André; Schmidt, Bertil

doi:10.1007/978-3-030-77961-0_31

RNACache: Fast Mapping of RNA-Seq Reads to Transcriptomes Using MinHashing

Julian Cascitti¹³,
Stefan Niebler¹³,
André Müller¹³ &
…
Bertil Schmidt¹³

Conference paper
First Online: 09 June 2021

1589 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12742))

Abstract

The alignment of reads to a transcriptome is an important initial step in a variety of bioinformatics RNA-seq pipelines. As traditional alignment-based tools suffer from high runtimes, alternative, alignment-free methods have recently gained increasing importance. We present a novel approach to the detection of local similarities between transcriptomes and RNA-seq reads based on context-aware minhashing. We introduce RNACache, a three-step processing pipeline consisting of minhashing of k-mers, match-based (online) filtering, and coverage-based filtering in order to identify truly expressed transcript isoforms. Our performance evaluation shows that RNACache produces transcriptomic mappings of high accuracy that include significantly fewer erroneous matches compared to the state-of-the-art tools RapMap, Salmon, and Kallisto. Furthermore, it offers scalable and highly competitive runtime performance at low memory consumption on common multi-core workstations. RNACache is publicly available at: https://github.com/jcasc/rnacache.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Berlin, K., Koren, S., Chin, C.S., et al.: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotech. 33, 623–630 (2015)
Article Google Scholar
Bray, N.L., Pimentel, H., Melsted, P., Pachter, L.: Near-optimal probabilistic RNA-seq quantification. Nat. Biotech. 34(5), 525–527 (2016)
Article Google Scholar
Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171), pp. 21–29 (1997)
Google Scholar
Broder, A.Z.: Identifying and filtering near-duplicate documents. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 1–10. Springer, Heidelberg (2000)
Chapter Google Scholar
Dobin, A., et al.: Star: ultrafast universal RNA-seq aligner. Bioinformatics 29(1), 15–21 (2013)
Article Google Scholar
Garber, M., Grabherr, M.G., Guttman, M., Trapnell, C.: Computational methods for transcriptome annotation and quantification using RNA-seq. Nat. Methods 8(6), 469–477 (2011)
Article Google Scholar
Griebel, T., et al.: Modelling and simulating generic RNA-seq experiments with the flux simulator. Nucleic Acids Res. 40(20), 10073–10083 (2012)
Article Google Scholar
Kobus, R., et al.: A big data approach to metagenomics for all-food-sequencing. BMC Bioinformatics 21(1), 1–15 (2020)
Article Google Scholar
Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with bowtie 2. Nat. Methods 9(4), 357–359 (2012)
Article Google Scholar
Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Data Sets. Cambridge University Press, Cambridge (2020)
Book Google Scholar
Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM (2013)
Google Scholar
Li, H., et al.: The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)
Article Google Scholar
Müller, A., Hundt, C., Hildebrandt, A., et al.: Metacache: context-aware classification of metagenomic reads using minhashing. Bioinformatics 33(23), 3740–3748 (2017)
Article Google Scholar
Nellore, A., et al.: Rail-RNA: scalable analysis of RNA-seq splicing and coverage. Bioinformatics 33(24), 4033–4040 (2017)
Google Scholar
Niebler, S., Müller, A., Hankeln, T., Schmidt, B.: Raindrop: rapid activation matrix computation for droplet-based single-cell RNA-seq reads. BMC Bioinformatics 21(1), 1–14 (2020)
Article Google Scholar
Ondov, B.D., Treangen, T.J., Melsted, P., et al.: Mash: fast genome and metagenome distance estimation using minhash. Genome Biol. 17, 132 (2016)
Article Google Scholar
Patro, R., Duggal, G., Love, M.I., Irizarry, R.A., Kingsford, C.: Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14(4), 417–419 (2017)
Article Google Scholar
Patro, R., Mount, S.M., Kingsford, C.: Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat. Biotechnol. 32(5), 462–464 (2014)
Article Google Scholar
Sarkar, H., Zakeri, M., Malik, L., Patro, R.: Towards selective-alignment: bridging the accuracy gap between alignment-based and alignment-free transcript quantification. In: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 27–36. BCB 2018. ACM (2018)
Google Scholar
Schmidt, B., Hildebrandt, A.: Next-generation sequencing: big data meets high performance computing. Drug Discovery Today 22(4), 712–717 (2017)
Article Google Scholar
Srivastava, A., Sarkar, H., Gupta, N., Patro, R.: RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics 32(12), i192–i200 (2016)
Article Google Scholar
Stephens, Z.D., et al.: Big data: astronomical or genomical? PLoS Biol. 13(7), e1002195 (2015)
Article Google Scholar
Wang, Z., Gerstein, M., Snyder, M.: RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Johannes Gutenberg-Universität Mainz Institute of Computer Science, 55128, Mainz, Germany
Julian Cascitti, Stefan Niebler, André Müller & Bertil Schmidt

Authors

Julian Cascitti
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Niebler
View author publications
You can also search for this author in PubMed Google Scholar
André Müller
View author publications
You can also search for this author in PubMed Google Scholar
Bertil Schmidt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julian Cascitti .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cascitti, J., Niebler, S., Müller, A., Schmidt, B. (2021). RNACache: Fast Mapping of RNA-Seq Reads to Transcriptomes Using MinHashing. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12742. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-77961-0_31
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77960-3
Online ISBN: 978-3-030-77961-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics