Abstract
RNA sequencing (RNA-Seq) provides both gene expression and sequence information, which can be exploited for a joint approach to explore cell processes in general and diseases caused by genomic variants in particular. However, the identification of insertions and deletions (indels) from RNA-Seq data, which for instance play a significant role in the development, detection, and treatment of cancer, still poses a challenge. In this paper, we present a qualitative comparison of selected methods for indel detection from RNA-Seq data. More specifically, we benchmarked two promising aligners and two filter methods on simulated as well as on real RNA-Seq data. We conclude that in cases where reliable detection of indels is crucial, e.g. in a clinical setting, the usage of our pipeline setup is superior to other state-of-the-art approaches.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Baruzzo, G., Hayer, K.E., Kim, E.J., Di Camillo, B., FitzGerald, G.A., Grant, G.R.: Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat. Methods 14(2), 135 (2017)
Broad Institute: Calling variants in RNAseq, January 2017. https://software.broadinstitute.org/gatk/documentation/article.php?id=3891
Broad Institute: Introduction to the GATK best practices, January 2018. https://software.broadinstitute.org/gatk/best-practices
Chen, L.Y., et al.: RNASEQR-a streamlined and accurate RNA-seq sequence analysis program. Nucleic Acids Res. 40(6), e42 (2011)
Dobin, A., Gingeras, T.R.: Mapping RNA-seq reads with star. Curr. Protoc. Bioinform. 51(1), 11–14 (2015)
ENCODE Project Consortium and Others: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57 (2012)
Engström, P.G., et al.: Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10(12), 1185 (2013)
Guo, Y., Dai, Y., Yu, H., Zhao, S., Samuels, D.C., Shyr, Y.: Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis. Genomics 109(2), 83–90 (2017)
Kim, D., Langmead, B., Salzberg, S.L.: HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12(4), 357 (2015)
Krusche, P., et al.: Best practices for benchmarking germline small variant calls in human genomes. bioRxiv, p. 270157 (2018)
Li, H.: A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27(21), 2987–2993 (2011)
Li, H.: Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30(20), 2843–2851 (2014)
Novocraft Technologies Sdn Bhd: RNAseq analysis: mRNA and the spliceosome. http://www.novocraft.com/documentation/novoalign-2/novoalign-user-guide/rnaseq-analysis-mrna-and-the-spliceosome
Novocraft Technologies Sdn Bhd: Novoalign reference manual, March 2014. http://www.novocraft.com/wp-content/uploads/Novocraft.pdf
Oikkonen, L., Lise, S.: Making the most of RNA-seq: pre-processing sequencing data with opossum for reliable SNP variant detection. Wellcome Open Res. 2, 6 (2017)
Piskol, R., Ramaswami, G., Li, J.B.: Reliable identification of genomic variants from RNA-seq data. Am. J. Hum. Genet. 93(4), 641–651 (2013)
Poplin, R., et al.: Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv, p. 201178 (2017)
QIAGEN Bioinformatics: CLC genomics workbench. https://www.qiagenbioinformatics.com/products/clc-genomics-workbench
Quinn, E.M., et al.: Development of strategies for SNP detection in RNA-seq data: application to lymphoblastoid cell lines and evaluation using 1000 Genomes data. PloS One 8(3), e58815 (2013)
Rimmer, A., et al.: Integrating mapping-, assembly-and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46(8), 912 (2014)
Sloan, C.A., et al.: ENCODE data at the ENCODE portal. Nucleic Acids Res. 44(D1), D726–D732 (2015)
Sun, Z., Bhagwate, A., Prodduturi, N., Yang, P., Kocher, J.P.A.: Indel detection from RNA-seq data: tool evaluation and strategies for accurate detection of actionable mutations. Brief. Bioinform. 18(6), 973–983 (2016)
Wu, T.D., Nacu, S.: Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7), 873–881 (2010)
Zook, J., et al.: Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials. bioRxiv, p. 281006 (2018)
Acknowledgement
Parts of this work were generously supported by a grant of the German Federal Ministry of Education and Research (031A427B).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Slosarek, T., Kraus, M., Schapranow, MP., Boettinger, E. (2019). Qualitative Comparison of Selected Indel Detection Methods for RNA-Seq Data. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11465. Springer, Cham. https://doi.org/10.1007/978-3-030-17938-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-17938-0_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17937-3
Online ISBN: 978-3-030-17938-0
eBook Packages: Computer ScienceComputer Science (R0)