Skip to main content
Log in

An automatic algorithm for software vulnerability classification based on CNN and GRU

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In order to improve the management efficiency of software vulnerability classification, reduce the risk of system being attacked and destroyed, and save the cost for vulnerability repair, this paper proposes an automatic algorithm for Software Vulnerability Classification based on convolutional neural network (CNN) and gate recurrent unit neural network (GRU), called SVC-CG. It has conducted a fusion between the models of CNN and GRU according to their advantages (CNN is good at extracting local vector features of vulnerability text and GRU is good at extracting global features related to the context of vulnerability text). The merger of the features extracted by the complementary models can represent the semantic and grammatical information more accurately. Firstly, the Skip-gram language model based on Word2Vec is used to train and generate the word vector, and the words in each vulnerability text are mapped into the space with limited dimensions to represent the semantic information. Then the CNN is used to extract the local features of the text vector, and the GRU is used to extract the global features related to the text context. We combine two complementary models to construct a SVC-CG neural network algorithm, which can represent semantic and grammatical information more accurately to realize automatic classification of vulnerabilities. The experiment uses the vulnerability data from the national vulnerability database (NVD) to train and evaluate the SVC-CG algorithm. Through experimental comparison and analysis, the SVC-CG algorithm proposed in this paper has a good performance on Macro recall rate, Macro precision rate and Macro F1-score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Aota M, Kanehara H, Kubo M, Murata N, Sun B, Takahashi T (2020) Automation of vulnerability classification from its description using machine learning. 2020 IEEE Symposium on Computers and Communications (ISCC), pp 1–7

  2. Bhuiyan FA, Sharif MB, Rahman A (2021) Security bug report usage for software vulnerability research: a systematic mapping study. IEEE Access 9:28471–28495

    Article  Google Scholar 

  3. Chiu C-C, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina E (2018) State-of-the-art speech recognition with sequence-to-sequence models. In Proc. of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 4774–4778

  4. Chung J, Gulcehre C, Cho KH, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, arXiv: 1412.3555

  5. Davari M, Zulkernine M, Jaafar F (2017) An automatic software vulnerability classification framework. In Proc. of 2017 International Conference on Software Security and Assurance (ICSSA), IEEE, pp 44–49

  6. Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211

    Article  Google Scholar 

  7. Gardner M, Grus J, Neumann M, Tafjord O, Zettlemoyer L (2018) AllenNLP: a deep semantic natural language processing platform. CoRR, abs/1803.07640, arXiv: 1803.07640

  8. Gawron M, Cheng F, Meinel C (2018) Automatic vulnerability classification using machine learning. In Proc. of Risks and Security of Internet and Systems, Springer International Publishing, Cham pp 3–17

  9. Harer JA, Kim LY, Russell RL, Ozdemir O, Kosta LR, Rangamani A, Hamilton LH, Centeno GI, Key JR, Ellingwood PM (2010) Automated software vulnerability detection with machine learning. CoRR, abs/1803.04497, arXiv: 1803.04497

  10. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  11. Hovsepyan A, Scandariato R, Joosen W, Walden J (2012) Software vulnerability prediction using text analysis techniques. In Proc. of the 4th International Workshop on Security Measurements and Metrics, ACM, pp 7–10

  12. Huang G, Li Y, Wang Q, Ren J, Cheng Y, Zhao X (2019) Automatic classification method for software vulnerability based on deep neural network. IEEE Access 7(1):28291–28298

    Article  Google Scholar 

  13. Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. CoRR, abs/1404.2188, arXiv: 1404.2188

  14. Kim Y (2014) Convolutional neural networks for sentence classification. CoRR, abs/1408.5882, arXiv: 1408.5882

  15. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In Proc. of the 25th International Conference on Neural Information Processing Systems - Volume 1, Curran Associates Inc, USA, pp 1097–1105

  16. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  17. Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) VulDeePecker: a deep learning-based system for vulnerability detection. CoRR, abs/1801.01681, arXiv: 1801.01681

  18. Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. CoRR, abs/1605.05101, arXiv: 1605.05101

  19. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Proces Syst 26:3111–3119

    Google Scholar 

  20. Na S, Kim T, Kim H (2017) A study on the classification of common vulnerabilities and exposures using Naïve Bayes. In Proc. of Advances on Broad-Band Wireless Computing, Communication and Applications, Springer International Publishing, Cham, pp. 657–662

  21. National Vulnerability Database [Online]. Available: http://nvd.nist.gov/vuln/data-feeds.

  22. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  23. Siewruk G, Mazurczyk W (2021) Context-aware software vulnerability classification using machine learning. IEEE Access

  24. Stop Word List [Online]. Available: https://pypi.org/project/stop-words/

  25. Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37(1):141–188

    Article  MathSciNet  Google Scholar 

  26. Vulnerability Categories [Online]. Available: https://nvd.nist.gov/vuln/categories

  27. Wijayasekara D, Manic M, McQueen M (2014) Vulnerability identification and classification via text mining bug databases. In Proc. of IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society, pp 3612–3618

  28. Wu F, Wang J, Liu J, Wang W (2018) Vulnerability detection with deep learning. in Proc. of IEEE International Conference on Computer & Communications, IEEE, pp 1298–1302

  29. Xiong W, Droppo J, Huang X, Seide F, Seltzer ML, Stolcke A, Yu D, Zweig G (2017) Toward human parity in conversational speech recognition. IEEE/ACM Trans Audio Speech Language Process 25(12):2410–2423

    Article  Google Scholar 

  30. Yih W-T, Toutanova K, Platt JC, Meek C (2011) Learning discriminative projections for text similarity measures. In Proc. of Proceedings of the Fifteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp 247–256

  31. Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification. Comput Sci, arXiv:1510.03820

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant Nos. 61807028, 61802332, and 61772449, the Youth Foundation of Hebei Educational Committee of China under Grant No. QN2021145, the Natural Science Foundation of Hebei Province of China under Grant No. F2019203120. The Fundamental Research Funds for the Central Universities under Grant No.N182303036.The authors are grateful to valuable comments and suggestions of the reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qian Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Q., Li, Y., Wang, Y. et al. An automatic algorithm for software vulnerability classification based on CNN and GRU. Multimed Tools Appl 81, 7103–7124 (2022). https://doi.org/10.1007/s11042-022-12049-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12049-1

Keywords

Navigation