Abstract
The identification of splice sites is significant to the delineation of gene structure and the understanding of complicated alternative mechanisms underlying gene transcriptional regulation. Currently, most of the existing approaches predict splice sites utilizing deep learning-based strategies. However, they may fail to assign high weights to important segments of sequences to capture distinctive features. Moreover, they often only apply neural network as a ‘black box’, arising criticism for scarce reasoning behind their decision-making. To address these issues, we present a novel method, SpliceSCANNER, to predict canonical splice sites via integration of attention mechanism with convolutional neural network (CNN). Furthermore, we adopted gradient-weighted class activation mapping (Grad-CAM) to interpret the results derived from models. We trained ten models for donor and acceptor on five species. Experiments demonstrate that SpliceSCANNER outperforms state-of-the-art methods on most of the datasets. Taking human data for instance, it achieves accuracy of 96.36% and 95.77% for donor and acceptor respectively. Finally, the cross-organism validation results illustrate that it has outstanding generalizability, indicating its powerful ability to annotate canonical splice sites for poorly studied species. We anticipate that it can mine potential splicing patterns and bring new advancements to the bioinformatics community. SpliceSCANNER is freely available as a web server at http://www.bioinfo-zhanglab.com/SpliceSCANNER/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
02 September 2023
A correction has been published.
References
Wang, G.-S., Cooper, T.A.: Splicing in disease: disruption of the splicing code and the decoding machinery. Nat. Rev. Genet. 8, 749–761 (2007)
Burset, M., Seledtsov, I.A., Solovyev, V.V.: SpliceDB: database of canonical and non-canonical mammalian splice sites. Nucl. Acids Res. 29, 255–259 (2001)
Pertea, M., Lin, X., Salzberg, S.L.: GeneSplicer: a new computational method for splice site prediction. Nucl. Acids Res. 29, 1185–1190 (2001)
Trapnell, C., Pachter, L., Salzberg, S.L.: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009)
Kim, D., Langmead, B., Salzberg, S.L.: HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015)
Li, H.: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018)
Liu, B., Liu, Y., Li, J., Guo, H., Zang, T., Wang, Y.: deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index. Genome Biol. 20, 1–14 (2019)
Wang, S., et al.: CnnPOGTP: a novel CNN-based predictor for identifying the optimal growth temperatures of prokaryotes using only genomic k-mers distribution. Bioinformatics 38, 3106–3108 (2022)
Hernández, D., Jara, N., Araya, M., Durán, R.E., Buil-Aranda, C.: PromoterLCNN: a light CNN-based promoter prediction and classification model. Genes 13, 1126 (2022)
Zuallaert, J., Godin, F., Kim, M., Soete, A., Saeys, Y., De Neve, W.: SpliceRover: interpretable convolutional neural networks for improved splice site prediction. Bioinformatics 34, 4180–4188 (2018)
Wang, R., Wang, Z., Wang, J., Li, S.: SpliceFinder: ab initio prediction of splice sites using convolutional neural network. BMC Bioinform. 20, 1–13 (2019)
Akpokiro, V., Oluwadare, O., Kalita, J.: DeepSplicer: an improved method of splice sites prediction using deep learning. In: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 606–609. IEEE (2021)
Akpokiro, V., Martin, T., Oluwadare, O.: EnsembleSplice: ensemble deep learning model for splice site prediction. BMC Bioinform. 23, 413 (2022)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Shun, K.T.T., Limanta, E.E., Khan, A.: An evaluation of backpropagation interpretability for graph classification with deep learning. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 561–570. IEEE (2020)
Albaradei, S., et al.: Splice2Deep: an ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA. Gene 763, 100035 (2020)
Teng, Q., Liu, Z., Song, Y., Han, K., Lu, Y.: A survey on the interpretability of deep learning in medical diagnosis. Multimed. Syst. 28, 1–21 (2022)
Nazari, I., Tayara, H., Chong, K.T.: Branch point selection in RNA splicing using deep learning. IEEE Access 7, 1800–1807 (2018)
Blumenthal, T., Spieth, J.: Gene structure and organization in Caenorhabditis elegans. Curr. Opin. Genet. Dev. 6, 692–698 (1996)
Acknowledgements
This work is supported by Guangxi Key Laboratory of Image and Graphic Intelligent Processing (GIIP2004), National Natural Science Foundation of China (61862017), and Innovation Project of GUET (Guilin University of Electronic Technology) Graduate Education (2022YCXS063).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, R., Xu, J., Huang, X., Qi, W., Zhang, Y. (2023). SpliceSCANNER: An Accurate and Interpretable Deep Learning-Based Method for Splice Site Prediction. In: Huang, DS., Premaratne, P., Jin, B., Qu, B., Jo, KH., Hussain, A. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2023. Lecture Notes in Computer Science, vol 14088. Springer, Singapore. https://doi.org/10.1007/978-981-99-4749-2_38
Download citation
DOI: https://doi.org/10.1007/978-981-99-4749-2_38
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4748-5
Online ISBN: 978-981-99-4749-2
eBook Packages: Computer ScienceComputer Science (R0)