An extensive study of the effects of different deep learning models on code vulnerability detection in Python code

Wang, Rongcun; Xu, Senlei; Ji, Xingyu; Tian, Yuan; Gong, Lina; Wang, Ke

doi:10.1007/s10515-024-00413-4

An extensive study of the effects of different deep learning models on code vulnerability detection in Python code

Published: 31 January 2024

Volume 31, article number 15, (2024)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Rongcun Wang¹,
Senlei Xu¹,
Xingyu Ji¹,
Yuan Tian²,
Lina Gong³ &
…
Ke Wang¹

1281 Accesses
1 Altmetric
Explore all metrics

Abstract

Deep learning has achieved great progress in automated code vulnerability detection. Several code vulnerability detection approaches based on deep learning have been proposed. However, few studies empirically studied the impacts of different deep learning models on code vulnerability detection in Python. For this reason, we strive to cover many more code representation learning models and classification models for vulnerability detection. We design and conduct an empirical study for evaluating the effects of the eighteen deep learning architectures derived from combinations of three representation learning models, i.e., Word2Vec, fastText, and CodeBERT, and six classification models, i.e., random forest, XGBoost, Multi-Layer Perception (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU) on code vulnerability detection in total. Additionally, two machine learning strategies i.e., the attention and bi-directional mechanisms are also empirically compared. The statistical significance and effect size analysis between different models are also conducted. In terms of precision, recall, and F-score, Word2Vec is better than Bidirectional Encoder Representations from Transformers CodeBERT and fastText. Likewise, long short-term memory (LSTM) and gated recurrent unit (GRU) are superior to other classification models we studied. The bi-directional LSTM and GRU with attention using Word2Vec are two optimal models for solving code vulnerability detection for Python code. Moreover, they have medium or large effect sizes on LSTM and GRU using only a single mechanism. Both the representation learning models and classification models have important influences on vulnerability detection in Python code. Likewise, the bi-directional and attention mechanisms can impact the performance of code vulnerability detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting vulnerability in source code using CNN and LSTM network

Article 03 July 2021

A Comparison of Different Source Code Representation Methods for Vulnerability Prediction in Python

A General Source Code Vulnerability Detection Method via Ensemble of Graph Neural Networks

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Aivatoglou, G., Anastasiadis, M., Spanos, G., Voulgaridis, A., Votis, K., Tzovaras, D.: A tree-based machine learning methodology to automatically classify software vulnerabilities. In: IEEE International Conference on CyberSecurity and Resilience (CSR), pp. 312–317 (2021). IEEE
Albawi, S., Mohammed, T.A., Al-Zawi, S.: Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET), pp. 1–6 (2017). IEEE
Alfadel, M., Costa, D.E., Shihab, E.: Empirical analysis of security vulnerabilities in python packages. In: Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 446–457 (2021)
Amin, A., Anwar, S., Adnan, A., Nawaz, M., Howard, N., Qadir, J., Hawalah, A., Hussain, A.: Comparing oversampling techniques to handle the class imbalance problem: a customer churn prediction case study. IEEE Access 4, 7940–7957 (2016)
Article Google Scholar
Aota, M., Kanehara, H., Kubo, M., Murata, N., Sun, B., Takahashi, T.: Automation of vulnerability classification from its description using machine learning. In: 2020 IEEE Symposium on Computers and Communications (ISCC), pp. 1–7 (2020). IEEE
Bagheri, A., Hegedűs, P.: A comparison of different source code representation methods for vulnerability prediction in python. In: International Conference on the Quality of Information and Communications Technology, pp. 267–281 (2021). Springer
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Bhandari, G., Naseer, A., Moonen, L.: CVEfixes: automated collection of vulnerabilities and their fixes from open-source software. In: Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (2021)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: are we there yet. IEEE Trans. Softw. Eng. 48(09), 3280–3296 (2022)
Article Google Scholar
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Chollet, F., et al.: Keras: the python deep learning library. Astrophysics Source Code Library (2018)
Cliff, N.: Dominance statistics: ordinal analyses to answer ordinal questions. Psychol. Bull. 144(3), 494–509 (1993)
Article Google Scholar
Dam, H.K., Tran, T., Pham, T., Ng, S.W., Grundy, J., Ghose, A.: Automatic feature learning for vulnerability prediction. arXiv:1708.02368 (2017)
Decan, A., Mens, T., Constantinou, E.: On the impact of security vulnerabilities in the npm package dependency network. In: Proceedings of the 15th International Conference on Mining Software Repositories, pp. 181–191 (2018)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dowd, M., McDonald, J., Schuh, J.: The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities. Addison-Wesley Professional (2006)
Engler, D., Chen, D.Y., Hallem, S., Chou, A., Chelf, B.: Bugs as seviant behavior: a general approach to inferring errors in systems code. ACM SIGOPS Oper. Syst. Rev. 35(5), 57–72 (2001)
Article Google Scholar
Fan, J., Li, Y., Wang, S., Nguyen, T.N.: A c/c++ code vulnerability dataset with code changes and CVE summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories, pp. 508–512 (2020)
Fang, Y., Liu, Y., Huang, C., Liu, L.: Fastembed: predicting vulnerability exploitation possibility based on ensemble machine learning algorithm. PLoS ONE 15(2), 0228439 (2020)
Article Google Scholar
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Article Google Scholar
Fu, M., Tantithamthavorn, C.: Linevul: a transformer-based line-level vulnerability prediction. In: 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR), pp. 608–620 (2022). https://doi.org/10.1145/3524842.3528452
Ghaffarian, S., Shahriari, H.R.: Software vulnerability analysis and discovery using machine-learning and data-mining techniques: a survey. ACM Comput. Surv. 50, 1–36 (2017)
Article Google Scholar
Gong, L., Jiang, S., Wang, R., Jiang, L.: Empirical evaluation of the impact of class overlap on software defect prediction. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 698–709 (2019)
Han, Z., Li, X., Xing, Z., Liu, H., Feng, Z.: Learning to predict severity of software vulnerability using only vulnerability description. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 125–136 (2017). IEEE
Harer, J.A., Kim, L.Y., Russell, R.L., Ozdemir, O., Kosta, L.R., Rangamani, A., Hamilton, L.H., Centeno, G.I., Key, J.R., Ellingwood, P.M., et al.: Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497 (2018)
Harzevili, N.S., Shin, J., Wang, J., Wang, S.: Characterizing and understanding software security vulnerabilities in machine learning libraries. arXiv preprint arXiv:2203.06502 (2022)
He, J., Wu, X., Cheng, Z., Yuan, Z., Jiang, Y.-G.: DB-LSTM: densely-connected bi-directional LSTM for human action recognition. Neurocomputing 444, 319–331 (2020)
Article Google Scholar
Heinemann, L., Deissenboeck, F., Gleirscher, M., Hummel, B., Irlbeck, M.: On the extent and nature of software reuse in open source java projects. In: Proceedings of the 12th International Conference on Top Productivity Through Software Reuse, pp. 207–222 (2011)
Herbold, S., Trautsch, A., Grabowski, J.: A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans. Softw. Eng. 44(9), 811–833 (2018)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hussain, S., Ibrahim, N.: Empirical investigation of role of meta-learning approaches for the improvement of software development process via software fault prediction. In: Proceedings of the International Conference on Evaluation and Assessment in Software Engineering 2022, pp. 413–420 (2022)
Jain, S., Wallace, B.C.: Attention is not explanation. arXiv:1902.10186 (2019)
Jiang, C., Hua, B., Ouyang, W., Fan, Q., Pan, Z.: Pyguard: finding and understanding vulnerabilities in python virtuals machines. In: Proceedings of the 32nd International Symposium on Software Reliability Engineering (ISSRE 2021), pp. 468–475 (2021)
Le, T.H.M., Chen, H., Babar, M.A.: A survey on data-driven software vulnerability assessment and prioritization. ACM Comput. Surv. 55(5) (2022)
Le, T., Sabir, B., Ali Babar, M.: Automated software vulnerability assessment with concept drift. In: Proceedings of IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 371–382 (2019)
Li, J., Monroe, W., Jurafsky, D.: Understanding neural networks through representation erasure. CoRR arXiv:1612.08220 (2016)
Li, Y., Wang, S., Nguyen, T.N.: Vulnerability detection with fine-grained interpretations. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 292–303 (2021)
Li, Z., Zou, D., Xu, S., Jin, H., Zhu, Y., Chen, Z.: Sysevr: a framework for using deep learning to detect software vulnerabilities. IEEE Trans. Depend. Secure Comput. (2021)
Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S., Deng, Z., Zhong, Y.: Vuldeepecker: a deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681 (2018)
Lin, G., Zhang, J., Luo, W., Pan, L., De Vel, O., Montague, P., Xiang, Y.: Software vulnerability discovery via learning multi-domain knowledge bases. IEEE Trans. Dependable Secure Comput. 18(5), 2469–2485 (2019)
Article Google Scholar
Lin, G., Wen, S., Han, Q.-L., Zhang, J., Xiang, Y.: Software vulnerability detection using deep neural networks: a survey. Proc. IEEE 108(10), 1825–1848 (2020)
Article Google Scholar
Ma, S., Thung, F., Lo, D., Sun, C., Deng, R.H.: Vurle: automatic vulnerability detection and repair by learning from examples. In: European Symposium on Research in Computer Security, pp. 229–246 (2017). Springer
Mashhadi, E., Hemmati, H.: Applying codebert for automated program repair of java simple bugs. In: Proceedings of the 18th International Conference on Mining Software Repositories (MSR), pp. 505–509 (2021). IEEE
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nikitopoulos, G., Dritsa, K., Louridas, P., Mitropoulos, D.: Crossvul: a cross-language vulnerability dataset with commit data. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ‘21), pp. 1565–1569 (2021)
Pang, Y., Xue, X., Namin, A.S.: Predicting vulnerable software components through n-gram analysis and statistical feature selection. In: Proceedings of the 14th International Conference on Machine Learning and Applications (ICMLA) (2015)
Pang, N., Zhao, X., Wang, W., Xiao, W., Guo, D.: Few-shot text classification by leveraging bi-directional attention and cross-class knowledge. Sci. China Inf. Sci. 64 (2021)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Pendleton, M., Garcia-Lebron, R., Cho, J.-H., Xu, S.: A survey on systems security metrics. ACM Comput. Surv. (CSUR) 49(4), 1–35 (2016)
Article Google Scholar
Pewny, J., Schuster, F., Bernhard, L., Holz, T., Rossow, C.: Leveraging semantic signatures for bug search in binary programs. In: Proceedings of the 30th Annual Computer Security Applications Conference, pp. 406–415 (2014)
Qiao, Y., Zhang, W., Du, X., Guizani, M.: Malware classification based on multilayer perception and word2vec for IoT security. ACM Trans. Internet Technol. 22(1), 1–22 (2021)
Article Google Scholar
Romano, J., Kromrey, J.C. J. D., Skowronek, J.: Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys. In: the Annual Meeting of the Florida Association of Institutional Research, pp. 1–31 (2006)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)
Article Google Scholar
Ruohonen, J.: An empirical analysis of vulnerabilities in python packages for web applications. In: 2018 9th International Workshop on Empirical Software Engineering in Practice (IWESEP), pp. 25–30 (2018). IEEE
Russell, R., Kim, L., Hamilton, L., Lazovich, T., Harer, J., Ozdemir, O., Ellingwood, P., McConley, M.: Automated vulnerability detection in source code using deep representation learning. In: Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 757–762 (2018)
Sahin, S.E., Tosun, A.: A conceptual replication on predicting the severity of software vulnerabilities. In: Proceedings of the Evaluation and Assessment on Software Engineering, pp. 244–250 (2019)
Semasaba, A.O.A., Zheng, W., Wu, X., Agyemang, S.A., Liu, T., Ge, Y.: An empirical evaluation of deep learning-based source code vulnerability detection: representation versus models. J. Softw. Evolut. Process. 2422 (2022)
Shin, Y., Williams, L.: Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18(1), 25–59 (2013)
Article Google Scholar
Shin, Y., Meneely, A., Williams, L., Osborne, J.A.: Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans. Softw. Eng. 37(6), 772–787 (2011)
Article Google Scholar
Stein, R.A., Jaques, P.A., Valiati, J.F.: An analysis of hierarchical text classification using word embeddings. Inf. Sci. 471, 216–232 (2019)
Article Google Scholar
Subhan, F., Wu, X., Bo, L., Sun, X., Rahman, M.: A deep learning-based approach for software vulnerability detection using code metrics. IET Softw. 16(5), 516–526 (2022)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Verdi, M., Sami, A., Akhondali, J., Khomh, F., Uddin, G., Motlagh, A.K.: An empirical study of c++ vulnerabilities in crowd-sourced code examples. IEEE Trans. Softw. Eng. (2020)
Wang, K., Cui, Y., Hu, J., Zhang, Y., Zhao, W., Feng, L.: Cyberbullying detection, based on the fasttext and word similarity schemes 20(1) (2020)
Wang, J., Li, B., Zeng, Y.: Xgboost-based android malware detection. In: Proceedings of the 13th International Conference on Computational Intelligence and Security (CIS), pp. 268–272 (2017). IEEE
Wartschinski, L., Noller, Y., Vogel, T., Kehrer, T., Grunske, L.: Vudenc: vulnerability detection with deep learning on a natural codebase for python. Inf. Softw. Technol. 144, 106809 (2022)
Article Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1(6), 80–83 (1944)
Article MathSciNet Google Scholar
Wu, X., Zheng, W., Chen, X., Wang, F., Mu, D.: CVE-assisted large-scale security bug report dataset construction method. J. Syst. Softw. 160, 110456 (2019)
Article Google Scholar
Xu, A., Dai, T., Chen, H., Ming, Z., Li, W.: Vulnerability detection for source code using contextual LSTM. In: 2018 5th International Conference on Systems and Informatics (ICSAI), pp. 1225–1230 (2018). IEEE
Zheng, Y., Pujar, S., Lewis, B., Buratti, L., Epstein, E., Yang, B., Laredo, J., Morari, A., Su, Z.: D2a: a dataset built for AI-based vulnerability detection methods using differential analysis. In: Proceedings of the 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP 2021), pp. 111–120 (2021)
Zhou, X., Han, D., Lo, D.: Assessing generalizability of codebert. In: Proceedings of the 37th International Conference on Software Maintenance and Evolution (ICSME), pp. 425–436 (2021). IEEE
Zou, D., Wang, S., Xu, S., Li, Z., Jin, H.: $\mu$ vuldeepecker: a deep learning-based system for multiclass vulnerability detection. IEEE Trans. Dependable Secure Comput. 18(5), 2224–2236 (2019)
Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for valuable comments and helpful suggestions. This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 61673384 and 62202223, and the Natural Science Foundation of Jiangsu Province, China under grant No. BK20220881.

Author information

Authors and Affiliations

School of Computer Science and Technology, China University of Mining and Technology, No. 1, Daxue Road, Xuzhou, 221116, Jiangsu, China
Rongcun Wang, Senlei Xu, Xingyu Ji & Ke Wang
School of Computing, Queen’s University, Kingston, Canada
Yuan Tian
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, No. 9, Jiangjun Street, Nanjing, 210016, Jiangsu, China
Lina Gong

Authors

Rongcun Wang
View author publications
You can also search for this author inPubMed Google Scholar
Senlei Xu
View author publications
You can also search for this author inPubMed Google Scholar
Xingyu Ji
View author publications
You can also search for this author inPubMed Google Scholar
Yuan Tian
View author publications
You can also search for this author inPubMed Google Scholar
Lina Gong
View author publications
You can also search for this author inPubMed Google Scholar
Ke Wang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

RW and SX proposed the methodology and wrote the mainuscript text, XJ analyzed the data and tested all programs related to the experiment, YT and LG edited the mainuscript, KW supervised the study. All authors reveiewed the manuscript.

Corresponding author

Correspondence to Ke Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, R., Xu, S., Ji, X. et al. An extensive study of the effects of different deep learning models on code vulnerability detection in Python code. Autom Softw Eng 31, 15 (2024). https://doi.org/10.1007/s10515-024-00413-4

Download citation

Received: 09 May 2023
Accepted: 04 January 2024
Published: 31 January 2024
DOI: https://doi.org/10.1007/s10515-024-00413-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An extensive study of the effects of different deep learning models on code vulnerability detection in Python code

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Detecting vulnerability in source code using CNN and LSTM network

A Comparison of Different Source Code Representation Methods for Vulnerability Prediction in Python

A General Source Code Vulnerability Detection Method via Ensemble of Graph Neural Networks

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now