Abstract
In order to improve the management efficiency of software vulnerability classification, reduce the risk of system being attacked and destroyed, and save the cost for vulnerability repair, this paper proposes an automatic algorithm for Software Vulnerability Classification based on convolutional neural network (CNN) and gate recurrent unit neural network (GRU), called SVC-CG. It has conducted a fusion between the models of CNN and GRU according to their advantages (CNN is good at extracting local vector features of vulnerability text and GRU is good at extracting global features related to the context of vulnerability text). The merger of the features extracted by the complementary models can represent the semantic and grammatical information more accurately. Firstly, the Skip-gram language model based on Word2Vec is used to train and generate the word vector, and the words in each vulnerability text are mapped into the space with limited dimensions to represent the semantic information. Then the CNN is used to extract the local features of the text vector, and the GRU is used to extract the global features related to the text context. We combine two complementary models to construct a SVC-CG neural network algorithm, which can represent semantic and grammatical information more accurately to realize automatic classification of vulnerabilities. The experiment uses the vulnerability data from the national vulnerability database (NVD) to train and evaluate the SVC-CG algorithm. Through experimental comparison and analysis, the SVC-CG algorithm proposed in this paper has a good performance on Macro recall rate, Macro precision rate and Macro F1-score.
Similar content being viewed by others
References
Aota M, Kanehara H, Kubo M, Murata N, Sun B, Takahashi T (2020) Automation of vulnerability classification from its description using machine learning. 2020 IEEE Symposium on Computers and Communications (ISCC), pp 1–7
Bhuiyan FA, Sharif MB, Rahman A (2021) Security bug report usage for software vulnerability research: a systematic mapping study. IEEE Access 9:28471–28495
Chiu C-C, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina E (2018) State-of-the-art speech recognition with sequence-to-sequence models. In Proc. of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 4774–4778
Chung J, Gulcehre C, Cho KH, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, arXiv: 1412.3555
Davari M, Zulkernine M, Jaafar F (2017) An automatic software vulnerability classification framework. In Proc. of 2017 International Conference on Software Security and Assurance (ICSSA), IEEE, pp 44–49
Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211
Gardner M, Grus J, Neumann M, Tafjord O, Zettlemoyer L (2018) AllenNLP: a deep semantic natural language processing platform. CoRR, abs/1803.07640, arXiv: 1803.07640
Gawron M, Cheng F, Meinel C (2018) Automatic vulnerability classification using machine learning. In Proc. of Risks and Security of Internet and Systems, Springer International Publishing, Cham pp 3–17
Harer JA, Kim LY, Russell RL, Ozdemir O, Kosta LR, Rangamani A, Hamilton LH, Centeno GI, Key JR, Ellingwood PM (2010) Automated software vulnerability detection with machine learning. CoRR, abs/1803.04497, arXiv: 1803.04497
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hovsepyan A, Scandariato R, Joosen W, Walden J (2012) Software vulnerability prediction using text analysis techniques. In Proc. of the 4th International Workshop on Security Measurements and Metrics, ACM, pp 7–10
Huang G, Li Y, Wang Q, Ren J, Cheng Y, Zhao X (2019) Automatic classification method for software vulnerability based on deep neural network. IEEE Access 7(1):28291–28298
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. CoRR, abs/1404.2188, arXiv: 1404.2188
Kim Y (2014) Convolutional neural networks for sentence classification. CoRR, abs/1408.5882, arXiv: 1408.5882
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In Proc. of the 25th International Conference on Neural Information Processing Systems - Volume 1, Curran Associates Inc, USA, pp 1097–1105
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) VulDeePecker: a deep learning-based system for vulnerability detection. CoRR, abs/1801.01681, arXiv: 1801.01681
Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. CoRR, abs/1605.05101, arXiv: 1605.05101
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Proces Syst 26:3111–3119
Na S, Kim T, Kim H (2017) A study on the classification of common vulnerabilities and exposures using Naïve Bayes. In Proc. of Advances on Broad-Band Wireless Computing, Communication and Applications, Springer International Publishing, Cham, pp. 657–662
National Vulnerability Database [Online]. Available: http://nvd.nist.gov/vuln/data-feeds.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Siewruk G, Mazurczyk W (2021) Context-aware software vulnerability classification using machine learning. IEEE Access
Stop Word List [Online]. Available: https://pypi.org/project/stop-words/
Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37(1):141–188
Vulnerability Categories [Online]. Available: https://nvd.nist.gov/vuln/categories
Wijayasekara D, Manic M, McQueen M (2014) Vulnerability identification and classification via text mining bug databases. In Proc. of IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society, pp 3612–3618
Wu F, Wang J, Liu J, Wang W (2018) Vulnerability detection with deep learning. in Proc. of IEEE International Conference on Computer & Communications, IEEE, pp 1298–1302
Xiong W, Droppo J, Huang X, Seide F, Seltzer ML, Stolcke A, Yu D, Zweig G (2017) Toward human parity in conversational speech recognition. IEEE/ACM Trans Audio Speech Language Process 25(12):2410–2423
Yih W-T, Toutanova K, Platt JC, Meek C (2011) Learning discriminative projections for text similarity measures. In Proc. of Proceedings of the Fifteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp 247–256
Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification. Comput Sci, arXiv:1510.03820
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant Nos. 61807028, 61802332, and 61772449, the Youth Foundation of Hebei Educational Committee of China under Grant No. QN2021145, the Natural Science Foundation of Hebei Province of China under Grant No. F2019203120. The Fundamental Research Funds for the Central Universities under Grant No.N182303036.The authors are grateful to valuable comments and suggestions of the reviewers.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Q., Li, Y., Wang, Y. et al. An automatic algorithm for software vulnerability classification based on CNN and GRU. Multimed Tools Appl 81, 7103–7124 (2022). https://doi.org/10.1007/s11042-022-12049-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12049-1