Skip to main content
Log in

An extensive study of the effects of different deep learning models on code vulnerability detection in Python code

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Deep learning has achieved great progress in automated code vulnerability detection. Several code vulnerability detection approaches based on deep learning have been proposed. However, few studies empirically studied the impacts of different deep learning models on code vulnerability detection in Python. For this reason, we strive to cover many more code representation learning models and classification models for vulnerability detection. We design and conduct an empirical study for evaluating the effects of the eighteen deep learning architectures derived from combinations of three representation learning models, i.e., Word2Vec, fastText, and CodeBERT, and six classification models, i.e., random forest, XGBoost, Multi-Layer Perception (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU) on code vulnerability detection in total. Additionally, two machine learning strategies i.e., the attention and bi-directional mechanisms are also empirically compared. The statistical significance and effect size analysis between different models are also conducted. In terms of precision, recall, and F-score, Word2Vec is better than Bidirectional Encoder Representations from Transformers CodeBERT and fastText. Likewise, long short-term memory (LSTM) and gated recurrent unit (GRU) are superior to other classification models we studied. The bi-directional LSTM and GRU with attention using Word2Vec are two optimal models for solving code vulnerability detection for Python code. Moreover, they have medium or large effect sizes on LSTM and GRU using only a single mechanism. Both the representation learning models and classification models have important influences on vulnerability detection in Python code. Likewise, the bi-directional and attention mechanisms can impact the performance of code vulnerability detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.globalsecuritymag.com/Trellix-Advanced-Research-Center-patches-61-000-vulnerable-open-source-projects.html.

  2. https://survey.stackoverflow.co/2022.

  3. https://bit.ly/3bX30ai.

  4. https://GitHub.com/huggingface/transformers.

  5. https://GitHub.com/scikit-learn.

  6. https://GitHub.com/Keras-team/Keras.

References

  • Aivatoglou, G., Anastasiadis, M., Spanos, G., Voulgaridis, A., Votis, K., Tzovaras, D.: A tree-based machine learning methodology to automatically classify software vulnerabilities. In: IEEE International Conference on CyberSecurity and Resilience (CSR), pp. 312–317 (2021). IEEE

  • Albawi, S., Mohammed, T.A., Al-Zawi, S.: Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET), pp. 1–6 (2017). IEEE

  • Alfadel, M., Costa, D.E., Shihab, E.: Empirical analysis of security vulnerabilities in python packages. In: Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 446–457 (2021)

  • Amin, A., Anwar, S., Adnan, A., Nawaz, M., Howard, N., Qadir, J., Hawalah, A., Hussain, A.: Comparing oversampling techniques to handle the class imbalance problem: a customer churn prediction case study. IEEE Access 4, 7940–7957 (2016)

    Article  Google Scholar 

  • Aota, M., Kanehara, H., Kubo, M., Murata, N., Sun, B., Takahashi, T.: Automation of vulnerability classification from its description using machine learning. In: 2020 IEEE Symposium on Computers and Communications (ISCC), pp. 1–7 (2020). IEEE

  • Bagheri, A., Hegedűs, P.: A comparison of different source code representation methods for vulnerability prediction in python. In: International Conference on the Quality of Information and Communications Technology, pp. 267–281 (2021). Springer

  • Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  • Bhandari, G., Naseer, A., Moonen, L.: CVEfixes: automated collection of vulnerabilities and their fixes from open-source software. In: Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (2021)

  • Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  • Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

  • Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: are we there yet. IEEE Trans. Softw. Eng. 48(09), 3280–3296 (2022)

    Article  Google Scholar 

  • Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

  • Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  • Chollet, F., et al.: Keras: the python deep learning library. Astrophysics Source Code Library (2018)

  • Cliff, N.: Dominance statistics: ordinal analyses to answer ordinal questions. Psychol. Bull. 144(3), 494–509 (1993)

    Article  Google Scholar 

  • Dam, H.K., Tran, T., Pham, T., Ng, S.W., Grundy, J., Ghose, A.: Automatic feature learning for vulnerability prediction. arXiv:1708.02368 (2017)

  • Decan, A., Mens, T., Constantinou, E.: On the impact of security vulnerabilities in the npm package dependency network. In: Proceedings of the 15th International Conference on Mining Software Repositories, pp. 181–191 (2018)

  • Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  • Dowd, M., McDonald, J., Schuh, J.: The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities. Addison-Wesley Professional (2006)

  • Engler, D., Chen, D.Y., Hallem, S., Chou, A., Chelf, B.: Bugs as seviant behavior: a general approach to inferring errors in systems code. ACM SIGOPS Oper. Syst. Rev. 35(5), 57–72 (2001)

    Article  Google Scholar 

  • Fan, J., Li, Y., Wang, S., Nguyen, T.N.: A c/c++ code vulnerability dataset with code changes and CVE summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories, pp. 508–512 (2020)

  • Fang, Y., Liu, Y., Huang, C., Liu, L.: Fastembed: predicting vulnerability exploitation possibility based on ensemble machine learning algorithm. PLoS ONE 15(2), 0228439 (2020)

    Article  Google Scholar 

  • Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)

  • Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)

    Article  Google Scholar 

  • Fu, M., Tantithamthavorn, C.: Linevul: a transformer-based line-level vulnerability prediction. In: 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR), pp. 608–620 (2022). https://doi.org/10.1145/3524842.3528452

  • Ghaffarian, S., Shahriari, H.R.: Software vulnerability analysis and discovery using machine-learning and data-mining techniques: a survey. ACM Comput. Surv. 50, 1–36 (2017)

    Article  Google Scholar 

  • Gong, L., Jiang, S., Wang, R., Jiang, L.: Empirical evaluation of the impact of class overlap on software defect prediction. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 698–709 (2019)

  • Han, Z., Li, X., Xing, Z., Liu, H., Feng, Z.: Learning to predict severity of software vulnerability using only vulnerability description. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 125–136 (2017). IEEE

  • Harer, J.A., Kim, L.Y., Russell, R.L., Ozdemir, O., Kosta, L.R., Rangamani, A., Hamilton, L.H., Centeno, G.I., Key, J.R., Ellingwood, P.M., et al.: Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497 (2018)

  • Harzevili, N.S., Shin, J., Wang, J., Wang, S.: Characterizing and understanding software security vulnerabilities in machine learning libraries. arXiv preprint arXiv:2203.06502 (2022)

  • He, J., Wu, X., Cheng, Z., Yuan, Z., Jiang, Y.-G.: DB-LSTM: densely-connected bi-directional LSTM for human action recognition. Neurocomputing 444, 319–331 (2020)

    Article  Google Scholar 

  • Heinemann, L., Deissenboeck, F., Gleirscher, M., Hummel, B., Irlbeck, M.: On the extent and nature of software reuse in open source java projects. In: Proceedings of the 12th International Conference on Top Productivity Through Software Reuse, pp. 207–222 (2011)

  • Herbold, S., Trautsch, A., Grabowski, J.: A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans. Softw. Eng. 44(9), 811–833 (2018)

    Article  Google Scholar 

  • Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  • Hussain, S., Ibrahim, N.: Empirical investigation of role of meta-learning approaches for the improvement of software development process via software fault prediction. In: Proceedings of the International Conference on Evaluation and Assessment in Software Engineering 2022, pp. 413–420 (2022)

  • Jain, S., Wallace, B.C.: Attention is not explanation. arXiv:1902.10186 (2019)

  • Jiang, C., Hua, B., Ouyang, W., Fan, Q., Pan, Z.: Pyguard: finding and understanding vulnerabilities in python virtuals machines. In: Proceedings of the 32nd International Symposium on Software Reliability Engineering (ISSRE 2021), pp. 468–475 (2021)

  • Le, T.H.M., Chen, H., Babar, M.A.: A survey on data-driven software vulnerability assessment and prioritization. ACM Comput. Surv. 55(5) (2022)

  • Le, T., Sabir, B., Ali Babar, M.: Automated software vulnerability assessment with concept drift. In: Proceedings of IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 371–382 (2019)

  • Li, J., Monroe, W., Jurafsky, D.: Understanding neural networks through representation erasure. CoRR arXiv:1612.08220 (2016)

  • Li, Y., Wang, S., Nguyen, T.N.: Vulnerability detection with fine-grained interpretations. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 292–303 (2021)

  • Li, Z., Zou, D., Xu, S., Jin, H., Zhu, Y., Chen, Z.: Sysevr: a framework for using deep learning to detect software vulnerabilities. IEEE Trans. Depend. Secure Comput. (2021)

  • Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S., Deng, Z., Zhong, Y.: Vuldeepecker: a deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681 (2018)

  • Lin, G., Zhang, J., Luo, W., Pan, L., De Vel, O., Montague, P., Xiang, Y.: Software vulnerability discovery via learning multi-domain knowledge bases. IEEE Trans. Dependable Secure Comput. 18(5), 2469–2485 (2019)

    Article  Google Scholar 

  • Lin, G., Wen, S., Han, Q.-L., Zhang, J., Xiang, Y.: Software vulnerability detection using deep neural networks: a survey. Proc. IEEE 108(10), 1825–1848 (2020)

    Article  Google Scholar 

  • Ma, S., Thung, F., Lo, D., Sun, C., Deng, R.H.: Vurle: automatic vulnerability detection and repair by learning from examples. In: European Symposium on Research in Computer Security, pp. 229–246 (2017). Springer

  • Mashhadi, E., Hemmati, H.: Applying codebert for automated program repair of java simple bugs. In: Proceedings of the 18th International Conference on Mining Software Repositories (MSR), pp. 505–509 (2021). IEEE

  • Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  • Nikitopoulos, G., Dritsa, K., Louridas, P., Mitropoulos, D.: Crossvul: a cross-language vulnerability dataset with commit data. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ‘21), pp. 1565–1569 (2021)

  • Pang, Y., Xue, X., Namin, A.S.: Predicting vulnerable software components through n-gram analysis and statistical feature selection. In: Proceedings of the 14th International Conference on Machine Learning and Applications (ICMLA) (2015)

  • Pang, N., Zhao, X., Wang, W., Xiao, W., Guo, D.: Few-shot text classification by leveraging bi-directional attention and cross-class knowledge. Sci. China Inf. Sci. 64 (2021)

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  • Pendleton, M., Garcia-Lebron, R., Cho, J.-H., Xu, S.: A survey on systems security metrics. ACM Comput. Surv. (CSUR) 49(4), 1–35 (2016)

    Article  Google Scholar 

  • Pewny, J., Schuster, F., Bernhard, L., Holz, T., Rossow, C.: Leveraging semantic signatures for bug search in binary programs. In: Proceedings of the 30th Annual Computer Security Applications Conference, pp. 406–415 (2014)

  • Qiao, Y., Zhang, W., Du, X., Guizani, M.: Malware classification based on multilayer perception and word2vec for IoT security. ACM Trans. Internet Technol. 22(1), 1–22 (2021)

    Article  Google Scholar 

  • Romano, J., Kromrey, J.C. J. D., Skowronek, J.: Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys. In: the Annual Meeting of the Florida Association of Institutional Research, pp. 1–31 (2006)

  • Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)

    Article  Google Scholar 

  • Ruohonen, J.: An empirical analysis of vulnerabilities in python packages for web applications. In: 2018 9th International Workshop on Empirical Software Engineering in Practice (IWESEP), pp. 25–30 (2018). IEEE

  • Russell, R., Kim, L., Hamilton, L., Lazovich, T., Harer, J., Ozdemir, O., Ellingwood, P., McConley, M.: Automated vulnerability detection in source code using deep representation learning. In: Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 757–762 (2018)

  • Sahin, S.E., Tosun, A.: A conceptual replication on predicting the severity of software vulnerabilities. In: Proceedings of the Evaluation and Assessment on Software Engineering, pp. 244–250 (2019)

  • Semasaba, A.O.A., Zheng, W., Wu, X., Agyemang, S.A., Liu, T., Ge, Y.: An empirical evaluation of deep learning-based source code vulnerability detection: representation versus models. J. Softw. Evolut. Process. 2422 (2022)

  • Shin, Y., Williams, L.: Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18(1), 25–59 (2013)

    Article  Google Scholar 

  • Shin, Y., Meneely, A., Williams, L., Osborne, J.A.: Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans. Softw. Eng. 37(6), 772–787 (2011)

    Article  Google Scholar 

  • Stein, R.A., Jaques, P.A., Valiati, J.F.: An analysis of hierarchical text classification using word embeddings. Inf. Sci. 471, 216–232 (2019)

    Article  Google Scholar 

  • Subhan, F., Wu, X., Bo, L., Sun, X., Rahman, M.: A deep learning-based approach for software vulnerability detection using code metrics. IET Softw. 16(5), 516–526 (2022)

    Article  Google Scholar 

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

  • Verdi, M., Sami, A., Akhondali, J., Khomh, F., Uddin, G., Motlagh, A.K.: An empirical study of c++ vulnerabilities in crowd-sourced code examples. IEEE Trans. Softw. Eng. (2020)

  • Wang, K., Cui, Y., Hu, J., Zhang, Y., Zhao, W., Feng, L.: Cyberbullying detection, based on the fasttext and word similarity schemes 20(1) (2020)

  • Wang, J., Li, B., Zeng, Y.: Xgboost-based android malware detection. In: Proceedings of the 13th International Conference on Computational Intelligence and Security (CIS), pp. 268–272 (2017). IEEE

  • Wartschinski, L., Noller, Y., Vogel, T., Kehrer, T., Grunske, L.: Vudenc: vulnerability detection with deep learning on a natural codebase for python. Inf. Softw. Technol. 144, 106809 (2022)

    Article  Google Scholar 

  • Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1(6), 80–83 (1944)

    Article  MathSciNet  Google Scholar 

  • Wu, X., Zheng, W., Chen, X., Wang, F., Mu, D.: CVE-assisted large-scale security bug report dataset construction method. J. Syst. Softw. 160, 110456 (2019)

    Article  Google Scholar 

  • Xu, A., Dai, T., Chen, H., Ming, Z., Li, W.: Vulnerability detection for source code using contextual LSTM. In: 2018 5th International Conference on Systems and Informatics (ICSAI), pp. 1225–1230 (2018). IEEE

  • Zheng, Y., Pujar, S., Lewis, B., Buratti, L., Epstein, E., Yang, B., Laredo, J., Morari, A., Su, Z.: D2a: a dataset built for AI-based vulnerability detection methods using differential analysis. In: Proceedings of the 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP 2021), pp. 111–120 (2021)

  • Zhou, X., Han, D., Lo, D.: Assessing generalizability of codebert. In: Proceedings of the 37th International Conference on Software Maintenance and Evolution (ICSME), pp. 425–436 (2021). IEEE

  • Zou, D., Wang, S., Xu, S., Li, Z., Jin, H.: \(\mu\) vuldeepecker: a deep learning-based system for multiclass vulnerability detection. IEEE Trans. Dependable Secure Comput. 18(5), 2224–2236 (2019)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for valuable comments and helpful suggestions. This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 61673384 and 62202223, and the Natural Science Foundation of Jiangsu Province, China under grant No. BK20220881.

Author information

Authors and Affiliations

Authors

Contributions

RW and SX proposed the methodology and wrote the mainuscript text, XJ analyzed the data and tested all programs related to the experiment, YT and LG edited the mainuscript, KW supervised the study. All authors reveiewed the manuscript.

Corresponding author

Correspondence to Ke Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, R., Xu, S., Ji, X. et al. An extensive study of the effects of different deep learning models on code vulnerability detection in Python code. Autom Softw Eng 31, 15 (2024). https://doi.org/10.1007/s10515-024-00413-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-024-00413-4

Keywords

Navigation