Skip to main content

AI-Powered Vulnerability Detection for Secure Source Code Development

  • Conference paper
  • First Online:
Innovative Security Solutions for Information Technology and Communications (SecITC 2022)

Abstract

Vulnerable source code in software applications is causing paramount reliability and security issues. Software security principles should be integrated to reduce these issues at the early stages of the development lifecycle. Artificial Intelligence (AI) could be applied to detect vulnerabilities in source code. In this research, a Machine Learning (ML) based method is proposed to detect source code vulnerabilities in C/C++ applications. Furthermore, Explainable AI (XAI) was applied to support developers in identifying vulnerable source code tokens and understanding their causes. The proposed model can detect whether the code is vulnerable or not in binary classification with 0.96 F1-Score. In case of vulnerability type detection, a multi-class classification based on CWE-ID, the model achieved 0.85 F1-Score. Several ML classifiers were tested, and the Random Forest (RF) and Extreme Gradient Boosting (XGB) performed well in binary and multi-class approaches respectively. Since the model is trained on a dataset containing actual source codes, the model is highly generalizable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://tree-sitter.github.io/tree-sitter.

  2. 2.

    https://cppcheck.sourceforge.io.

  3. 3.

    https://github.com/david-a-wheeler/flawfinder.

  4. 4.

    https://github.com/eliben/pycparser.

  5. 5.

    https://cve.mitre.org/.

  6. 6.

    https://samate.nist.gov/SARD/.

  7. 7.

    https://www.nist.gov/itl/ssd/software-quality-group/static-analysis-tool-exposition-sate-iv.

References

  1. Bilgin, Z., Ersoy, M.A., Soykan, E.U., Tomur, E., Çomak, P., Karaçay, L.: Vulnerability prediction from source code using machine learning. IEEE Access 8, 150672–150684 (2020)

    Article  Google Scholar 

  2. Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: are we there yet? IEEE Trans. Softw. Eng. 48(9), 3280–3296 (2022). https://doi.org/10.1109/TSE.2021.3087402

    Article  Google Scholar 

  3. Corporation, M: Common Weakness Enumeration (CWE) (2022). https://cwe.mitre.org/. Accessed 01 Feb 2022

  4. Corporation, M: CVE Details (2022). https://www.cvedetails.com/. Accessed 01 Feb 2022

  5. Dam, H.K., Tran, T., Pham, T., Ng, S.W., Grundy, J., Ghose, A.: Automatic feature learning for vulnerability prediction. arXiv preprint arXiv:1708.02368 (2017)

  6. Du, X., et al.: Leopard: identifying vulnerable code for vulnerability assessment through program metrics. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 60–71. IEEE (2019)

    Google Scholar 

  7. Fujdiak, R., et al.: Managing the secure software development. In: 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–4 (2019). https://doi.org/10.1109/NTMS.2019.8763845

  8. Grieco, G., Grinblat, G.L., Uzal, L., Rawat, S., Feist, J., Mounier, L.: Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 85–96 (2016)

    Google Scholar 

  9. Harer, J.A., et al.: Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497 (2018)

  10. Hata, H., Mizuno, O., Kikuno, T.: Fault-prone module detection using large-scale text features based on spam filtering. Empir. Softw. Eng. 15(2), 147–165 (2010)

    Article  Google Scholar 

  11. Jimenez, M.: Evaluating vulnerability prediction models. Ph.D. thesis, University of Luxembourg, Luxembourg (2018)

    Google Scholar 

  12. Jimenez, M., Rwemalika, R., Papadakis, M., Sarro, F., Le Traon, Y., Harman, M.: The importance of accounting for real-world labelling when predicting software vulnerabilities. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 695–705 (2019)

    Google Scholar 

  13. Jin, Z., Yu, Y.: Current and future research of machine learning based vulnerability detection. In: 2018 Eighth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC), pp. 1562–1566 (2018). https://doi.org/10.1109/IMCCC.2018.00322

  14. Li, Z., et al.: VulDeePecker: a deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681 (2018)

  15. Morgan, S.: Is poor software development the biggest cyber threat (2015). https://www.csoonline.com/article/2978858

  16. Pang, Y., Xue, X., Namin, A.S.: Predicting vulnerable software components through N-gram analysis and statistical feature selection. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 543–548 (2015). https://doi.org/10.1109/ICMLA.2015.99

  17. Pereira, J.D., Vieira, M.: On the use of open-source C/C++ static analysis tools in large projects. In: 2020 16th European Dependable Computing Conference (EDCC), pp. 97–102. IEEE (2020). https://doi.org/10.1109/EDCC51268.2020.00025

  18. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)

    Google Scholar 

  19. Russell, R., et al.: Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 757–762. IEEE (2018)

    Google Scholar 

  20. Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)

    Article  Google Scholar 

  21. Senanayake, J., Kalutarage, H., Al-Kadri, M.O.: Android mobile malware detection using machine learning: a systematic review. Electronics 10(13) (2021). https://doi.org/10.3390/electronics10131606. https://www.mdpi.com/2079-9292/10/13/1606

  22. Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Android source code vulnerability detection: a systematic literature review. ACM Comput. Surv. (2022). https://doi.org/10.1145/3556974, just Accepted

  23. Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Developing secured android applications by mitigating code vulnerabilities with machine learning. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, ASIA CCS 2022, pp. 1255–1257. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3488932.3527290

  24. Tahaei, M., Vaniea, K.: A survey on developer-centred security. In: 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS &PW), pp. 129–138 (2019). https://doi.org/10.1109/EuroSPW.2019.00021

  25. Wile, D.S.: Abstract syntax from concrete syntax. In: Proceedings of the 19th International Conference on Software Engineering, pp. 472–480 (1997)

    Google Scholar 

  26. Xie, J., Lipford, H.R., Chu, B.: Why do programmers make security errors? In: 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 161–164 (2011). https://doi.org/10.1109/VLHCC.2011.6070393

  27. Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 359–368 (2012)

    Google Scholar 

  28. Zeng, P., Lin, G., Pan, L., Tai, Y., Zhang, J.: Software vulnerability analysis and discovery using deep learning techniques: a survey. IEEE Access (2020)

    Google Scholar 

  29. Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: NeurIPS (2019)

    Google Scholar 

Download references

Acknowledgment

This work has been funded by The Scottish Funding Council, we are thankful to the funder for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sampath Rajapaksha .

Editor information

Editors and Affiliations

Appendix: Common Weaknesses in C/C++ Source Code

Appendix: Common Weaknesses in C/C++ Source Code

CWE-ID

CWE-Name

Sample Vulnerable C/C++ Code

CWE-20

Improper Input Validation

board = (board_square_t*) malloc(m * n * sizeof(board_square_t));

CWE-78

Improper Neutralization of Special Elements used in an OS Command (‘OS Command Injection’)

system(NULL)

CWE-120

Buffer Copy without Checking Size of Input (‘Classic Buffer Overflow’)

strcpy(buf, string);

CWE-126

Buffer Over-read

strncpy(Filename, argv[1], sizeof(Filename));

CWE-134

Use of Externally-Controlled Format String

snprintf(buf, 128, argv[1]);

CWE-190

Integer Overflow or Wraparound response

xmalloc(nresp*sizeof(char*));

CWE-327

Use of a Broken or Risky Cryptographic Algorithm

EVP_des_ecb();

CWE-362

Concurrent Execution using Shared Resource with Improper Synchronization (‘Race Condition’)

pthread_mutex_lock(mutex);

CWE-401

Missing Release of Memory after Effective Lifetime

char buf = (char) malloc(BLOCK_SIZE); read(fd, buf, BLOCK_SIZE) != BLOCK_SIZE;

CWE-457

Use of Uninitialized Variable

char *test_string; if (i != err_val) test_string = “Hello World!”; printf(“%s”, test_string);

CWE-676

Use of Potentially Dangerous Function

char buf[24]; strcpy(buf, string);

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rajapaksha, S., Senanayake, J., Kalutarage, H., Al-Kadri, M.O. (2023). AI-Powered Vulnerability Detection for Secure Source Code Development. In: Bella, G., Doinea, M., Janicke, H. (eds) Innovative Security Solutions for Information Technology and Communications. SecITC 2022. Lecture Notes in Computer Science, vol 13809. Springer, Cham. https://doi.org/10.1007/978-3-031-32636-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-32636-3_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-32635-6

  • Online ISBN: 978-3-031-32636-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics