Abstract
Vulnerable source code in software applications is causing paramount reliability and security issues. Software security principles should be integrated to reduce these issues at the early stages of the development lifecycle. Artificial Intelligence (AI) could be applied to detect vulnerabilities in source code. In this research, a Machine Learning (ML) based method is proposed to detect source code vulnerabilities in C/C++ applications. Furthermore, Explainable AI (XAI) was applied to support developers in identifying vulnerable source code tokens and understanding their causes. The proposed model can detect whether the code is vulnerable or not in binary classification with 0.96 F1-Score. In case of vulnerability type detection, a multi-class classification based on CWE-ID, the model achieved 0.85 F1-Score. Several ML classifiers were tested, and the Random Forest (RF) and Extreme Gradient Boosting (XGB) performed well in binary and multi-class approaches respectively. Since the model is trained on a dataset containing actual source codes, the model is highly generalizable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Bilgin, Z., Ersoy, M.A., Soykan, E.U., Tomur, E., Çomak, P., Karaçay, L.: Vulnerability prediction from source code using machine learning. IEEE Access 8, 150672–150684 (2020)
Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: are we there yet? IEEE Trans. Softw. Eng. 48(9), 3280–3296 (2022). https://doi.org/10.1109/TSE.2021.3087402
Corporation, M: Common Weakness Enumeration (CWE) (2022). https://cwe.mitre.org/. Accessed 01 Feb 2022
Corporation, M: CVE Details (2022). https://www.cvedetails.com/. Accessed 01 Feb 2022
Dam, H.K., Tran, T., Pham, T., Ng, S.W., Grundy, J., Ghose, A.: Automatic feature learning for vulnerability prediction. arXiv preprint arXiv:1708.02368 (2017)
Du, X., et al.: Leopard: identifying vulnerable code for vulnerability assessment through program metrics. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 60–71. IEEE (2019)
Fujdiak, R., et al.: Managing the secure software development. In: 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–4 (2019). https://doi.org/10.1109/NTMS.2019.8763845
Grieco, G., Grinblat, G.L., Uzal, L., Rawat, S., Feist, J., Mounier, L.: Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 85–96 (2016)
Harer, J.A., et al.: Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497 (2018)
Hata, H., Mizuno, O., Kikuno, T.: Fault-prone module detection using large-scale text features based on spam filtering. Empir. Softw. Eng. 15(2), 147–165 (2010)
Jimenez, M.: Evaluating vulnerability prediction models. Ph.D. thesis, University of Luxembourg, Luxembourg (2018)
Jimenez, M., Rwemalika, R., Papadakis, M., Sarro, F., Le Traon, Y., Harman, M.: The importance of accounting for real-world labelling when predicting software vulnerabilities. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 695–705 (2019)
Jin, Z., Yu, Y.: Current and future research of machine learning based vulnerability detection. In: 2018 Eighth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC), pp. 1562–1566 (2018). https://doi.org/10.1109/IMCCC.2018.00322
Li, Z., et al.: VulDeePecker: a deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681 (2018)
Morgan, S.: Is poor software development the biggest cyber threat (2015). https://www.csoonline.com/article/2978858
Pang, Y., Xue, X., Namin, A.S.: Predicting vulnerable software components through N-gram analysis and statistical feature selection. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 543–548 (2015). https://doi.org/10.1109/ICMLA.2015.99
Pereira, J.D., Vieira, M.: On the use of open-source C/C++ static analysis tools in large projects. In: 2020 16th European Dependable Computing Conference (EDCC), pp. 97–102. IEEE (2020). https://doi.org/10.1109/EDCC51268.2020.00025
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Russell, R., et al.: Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 757–762. IEEE (2018)
Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)
Senanayake, J., Kalutarage, H., Al-Kadri, M.O.: Android mobile malware detection using machine learning: a systematic review. Electronics 10(13) (2021). https://doi.org/10.3390/electronics10131606. https://www.mdpi.com/2079-9292/10/13/1606
Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Android source code vulnerability detection: a systematic literature review. ACM Comput. Surv. (2022). https://doi.org/10.1145/3556974, just Accepted
Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Developing secured android applications by mitigating code vulnerabilities with machine learning. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, ASIA CCS 2022, pp. 1255–1257. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3488932.3527290
Tahaei, M., Vaniea, K.: A survey on developer-centred security. In: 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS &PW), pp. 129–138 (2019). https://doi.org/10.1109/EuroSPW.2019.00021
Wile, D.S.: Abstract syntax from concrete syntax. In: Proceedings of the 19th International Conference on Software Engineering, pp. 472–480 (1997)
Xie, J., Lipford, H.R., Chu, B.: Why do programmers make security errors? In: 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 161–164 (2011). https://doi.org/10.1109/VLHCC.2011.6070393
Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 359–368 (2012)
Zeng, P., Lin, G., Pan, L., Tai, Y., Zhang, J.: Software vulnerability analysis and discovery using deep learning techniques: a survey. IEEE Access (2020)
Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: NeurIPS (2019)
Acknowledgment
This work has been funded by The Scottish Funding Council, we are thankful to the funder for their support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Common Weaknesses in C/C++ Source Code
Appendix: Common Weaknesses in C/C++ Source Code
CWE-ID | CWE-Name | Sample Vulnerable C/C++ Code |
---|---|---|
CWE-20 | Improper Input Validation | board = (board_square_t*) malloc(m * n * sizeof(board_square_t)); |
CWE-78 | Improper Neutralization of Special Elements used in an OS Command (‘OS Command Injection’) | system(NULL) |
CWE-120 | Buffer Copy without Checking Size of Input (‘Classic Buffer Overflow’) | strcpy(buf, string); |
CWE-126 | Buffer Over-read | strncpy(Filename, argv[1], sizeof(Filename)); |
CWE-134 | Use of Externally-Controlled Format String | snprintf(buf, 128, argv[1]); |
CWE-190 | Integer Overflow or Wraparound response | xmalloc(nresp*sizeof(char*)); |
CWE-327 | Use of a Broken or Risky Cryptographic Algorithm | EVP_des_ecb(); |
CWE-362 | Concurrent Execution using Shared Resource with Improper Synchronization (‘Race Condition’) | pthread_mutex_lock(mutex); |
CWE-401 | Missing Release of Memory after Effective Lifetime | char buf = (char) malloc(BLOCK_SIZE); read(fd, buf, BLOCK_SIZE) != BLOCK_SIZE; |
CWE-457 | Use of Uninitialized Variable | char *test_string; if (i != err_val) test_string = “Hello World!”; printf(“%s”, test_string); |
CWE-676 | Use of Potentially Dangerous Function | char buf[24]; strcpy(buf, string); |
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rajapaksha, S., Senanayake, J., Kalutarage, H., Al-Kadri, M.O. (2023). AI-Powered Vulnerability Detection for Secure Source Code Development. In: Bella, G., Doinea, M., Janicke, H. (eds) Innovative Security Solutions for Information Technology and Communications. SecITC 2022. Lecture Notes in Computer Science, vol 13809. Springer, Cham. https://doi.org/10.1007/978-3-031-32636-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-32636-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32635-6
Online ISBN: 978-3-031-32636-3
eBook Packages: Computer ScienceComputer Science (R0)