An automatic algorithm for software vulnerability classification based on CNN and GRU

Wang, Qian; Li, Yazhou; Wang, Yan; Ren, Jiadong

doi:10.1007/s11042-022-12049-1

An automatic algorithm for software vulnerability classification based on CNN and GRU

Published: 24 January 2022

Volume 81, pages 7103–7124, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Qian Wang¹,
Yazhou Li²,
Yan Wang³ &
…
Jiadong Ren¹

686 Accesses
9 Citations
Explore all metrics

Abstract

In order to improve the management efficiency of software vulnerability classification, reduce the risk of system being attacked and destroyed, and save the cost for vulnerability repair, this paper proposes an automatic algorithm for Software Vulnerability Classification based on convolutional neural network (CNN) and gate recurrent unit neural network (GRU), called SVC-CG. It has conducted a fusion between the models of CNN and GRU according to their advantages (CNN is good at extracting local vector features of vulnerability text and GRU is good at extracting global features related to the context of vulnerability text). The merger of the features extracted by the complementary models can represent the semantic and grammatical information more accurately. Firstly, the Skip-gram language model based on Word2Vec is used to train and generate the word vector, and the words in each vulnerability text are mapped into the space with limited dimensions to represent the semantic information. Then the CNN is used to extract the local features of the text vector, and the GRU is used to extract the global features related to the text context. We combine two complementary models to construct a SVC-CG neural network algorithm, which can represent semantic and grammatical information more accurately to realize automatic classification of vulnerabilities. The experiment uses the vulnerability data from the national vulnerability database (NVD) to train and evaluate the SVC-CG algorithm. Through experimental comparison and analysis, the SVC-CG algorithm proposed in this paper has a good performance on Macro recall rate, Macro precision rate and Macro F1-score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AI-Driven Cybersecurity: An Overview, Security Intelligence Modeling and Research Directions

Article 26 March 2021

A comprehensive survey of AI-enabled phishing attacks detection techniques

Article 23 October 2020

Applying NLP techniques to malware detection in a practical environment

Article Open access 06 June 2021

References

Aota M, Kanehara H, Kubo M, Murata N, Sun B, Takahashi T (2020) Automation of vulnerability classification from its description using machine learning. 2020 IEEE Symposium on Computers and Communications (ISCC), pp 1–7
Bhuiyan FA, Sharif MB, Rahman A (2021) Security bug report usage for software vulnerability research: a systematic mapping study. IEEE Access 9:28471–28495
Article Google Scholar
Chiu C-C, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina E (2018) State-of-the-art speech recognition with sequence-to-sequence models. In Proc. of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 4774–4778
Chung J, Gulcehre C, Cho KH, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, arXiv: 1412.3555
Davari M, Zulkernine M, Jaafar F (2017) An automatic software vulnerability classification framework. In Proc. of 2017 International Conference on Software Security and Assurance (ICSSA), IEEE, pp 44–49
Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211
Article Google Scholar
Gardner M, Grus J, Neumann M, Tafjord O, Zettlemoyer L (2018) AllenNLP: a deep semantic natural language processing platform. CoRR, abs/1803.07640, arXiv: 1803.07640
Gawron M, Cheng F, Meinel C (2018) Automatic vulnerability classification using machine learning. In Proc. of Risks and Security of Internet and Systems, Springer International Publishing, Cham pp 3–17
Harer JA, Kim LY, Russell RL, Ozdemir O, Kosta LR, Rangamani A, Hamilton LH, Centeno GI, Key JR, Ellingwood PM (2010) Automated software vulnerability detection with machine learning. CoRR, abs/1803.04497, arXiv: 1803.04497
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Hovsepyan A, Scandariato R, Joosen W, Walden J (2012) Software vulnerability prediction using text analysis techniques. In Proc. of the 4th International Workshop on Security Measurements and Metrics, ACM, pp 7–10
Huang G, Li Y, Wang Q, Ren J, Cheng Y, Zhao X (2019) Automatic classification method for software vulnerability based on deep neural network. IEEE Access 7(1):28291–28298
Article Google Scholar
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. CoRR, abs/1404.2188, arXiv: 1404.2188
Kim Y (2014) Convolutional neural networks for sentence classification. CoRR, abs/1408.5882, arXiv: 1408.5882
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In Proc. of the 25th International Conference on Neural Information Processing Systems - Volume 1, Curran Associates Inc, USA, pp 1097–1105
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, Deng Z, Zhong Y (2018) VulDeePecker: a deep learning-based system for vulnerability detection. CoRR, abs/1801.01681, arXiv: 1801.01681
Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. CoRR, abs/1605.05101, arXiv: 1605.05101
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Proces Syst 26:3111–3119
Google Scholar
Na S, Kim T, Kim H (2017) A study on the classification of common vulnerabilities and exposures using Naïve Bayes. In Proc. of Advances on Broad-Band Wireless Computing, Communication and Applications, Springer International Publishing, Cham, pp. 657–662
National Vulnerability Database [Online]. Available: http://nvd.nist.gov/vuln/data-feeds.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Siewruk G, Mazurczyk W (2021) Context-aware software vulnerability classification using machine learning. IEEE Access
Stop Word List [Online]. Available: https://pypi.org/project/stop-words/
Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37(1):141–188
Article MathSciNet Google Scholar
Vulnerability Categories [Online]. Available: https://nvd.nist.gov/vuln/categories
Wijayasekara D, Manic M, McQueen M (2014) Vulnerability identification and classification via text mining bug databases. In Proc. of IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society, pp 3612–3618
Wu F, Wang J, Liu J, Wang W (2018) Vulnerability detection with deep learning. in Proc. of IEEE International Conference on Computer & Communications, IEEE, pp 1298–1302
Xiong W, Droppo J, Huang X, Seide F, Seltzer ML, Stolcke A, Yu D, Zweig G (2017) Toward human parity in conversational speech recognition. IEEE/ACM Trans Audio Speech Language Process 25(12):2410–2423
Article Google Scholar
Yih W-T, Toutanova K, Platt JC, Meek C (2011) Learning discriminative projections for text similarity measures. In Proc. of Proceedings of the Fifteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp 247–256
Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification. Comput Sci, arXiv:1510.03820

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant Nos. 61807028, 61802332, and 61772449, the Youth Foundation of Hebei Educational Committee of China under Grant No. QN2021145, the Natural Science Foundation of Hebei Province of China under Grant No. F2019203120. The Fundamental Research Funds for the Central Universities under Grant No.N182303036.The authors are grateful to valuable comments and suggestions of the reviewers.

Author information

Authors and Affiliations

Computer Virtual Technology and System Integration Laboratory of Hebei Province, College of Information Science and Engineering, Yanshan University, Qinhuangdao, 066000, Hebei, China
Qian Wang & Jiadong Ren
China Mobile Xiong’an Information and Communication Technology Co. Ltd, Xiong’an, 071700, Hebei, China
Yazhou Li
Computing Center, Northeastern University at Qinhuangdao, Qinhuangdao, 066000, Hebei, China
Yan Wang

Authors

Qian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yazhou Li
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiadong Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qian Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Q., Li, Y., Wang, Y. et al. An automatic algorithm for software vulnerability classification based on CNN and GRU. Multimed Tools Appl 81, 7103–7124 (2022). https://doi.org/10.1007/s11042-022-12049-1

Download citation

Received: 22 January 2021
Revised: 24 May 2021
Accepted: 03 January 2022
Published: 24 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11042-022-12049-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An automatic algorithm for software vulnerability classification based on CNN and GRU

Abstract

Access this article

Similar content being viewed by others

AI-Driven Cybersecurity: An Overview, Security Intelligence Modeling and Research Directions

A comprehensive survey of AI-enabled phishing attacks detection techniques

Applying NLP techniques to malware detection in a practical environment

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An automatic algorithm for software vulnerability classification based on CNN and GRU

Abstract

Access this article

Similar content being viewed by others

AI-Driven Cybersecurity: An Overview, Security Intelligence Modeling and Research Directions

A comprehensive survey of AI-enabled phishing attacks detection techniques

Applying NLP techniques to malware detection in a practical environment

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation