AI-Powered Vulnerability Detection for Secure Source Code Development

Rajapaksha, Sampath; Senanayake, Janaka; Kalutarage, Harsha; Al-Kadri, Mhd Omar

doi:10.1007/978-3-031-32636-3_16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13809))

Included in the following conference series:

International Conference on Information Technology and Communications Security

424 Accesses
5 Citations

Abstract

Vulnerable source code in software applications is causing paramount reliability and security issues. Software security principles should be integrated to reduce these issues at the early stages of the development lifecycle. Artificial Intelligence (AI) could be applied to detect vulnerabilities in source code. In this research, a Machine Learning (ML) based method is proposed to detect source code vulnerabilities in C/C++ applications. Furthermore, Explainable AI (XAI) was applied to support developers in identifying vulnerable source code tokens and understanding their causes. The proposed model can detect whether the code is vulnerable or not in binary classification with 0.96 F1-Score. In case of vulnerability type detection, a multi-class classification based on CWE-ID, the model achieved 0.85 F1-Score. Several ML classifiers were tested, and the Random Forest (RF) and Extreme Gradient Boosting (XGB) performed well in binary and multi-class approaches respectively. Since the model is trained on a dataset containing actual source codes, the model is highly generalizable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bilgin, Z., Ersoy, M.A., Soykan, E.U., Tomur, E., Çomak, P., Karaçay, L.: Vulnerability prediction from source code using machine learning. IEEE Access 8, 150672–150684 (2020)
Article Google Scholar
Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: are we there yet? IEEE Trans. Softw. Eng. 48(9), 3280–3296 (2022). https://doi.org/10.1109/TSE.2021.3087402
Article Google Scholar
Corporation, M: Common Weakness Enumeration (CWE) (2022). https://cwe.mitre.org/. Accessed 01 Feb 2022
Corporation, M: CVE Details (2022). https://www.cvedetails.com/. Accessed 01 Feb 2022
Dam, H.K., Tran, T., Pham, T., Ng, S.W., Grundy, J., Ghose, A.: Automatic feature learning for vulnerability prediction. arXiv preprint arXiv:1708.02368 (2017)
Du, X., et al.: Leopard: identifying vulnerable code for vulnerability assessment through program metrics. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 60–71. IEEE (2019)
Google Scholar
Fujdiak, R., et al.: Managing the secure software development. In: 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–4 (2019). https://doi.org/10.1109/NTMS.2019.8763845
Grieco, G., Grinblat, G.L., Uzal, L., Rawat, S., Feist, J., Mounier, L.: Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 85–96 (2016)
Google Scholar
Harer, J.A., et al.: Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497 (2018)
Hata, H., Mizuno, O., Kikuno, T.: Fault-prone module detection using large-scale text features based on spam filtering. Empir. Softw. Eng. 15(2), 147–165 (2010)
Article Google Scholar
Jimenez, M.: Evaluating vulnerability prediction models. Ph.D. thesis, University of Luxembourg, Luxembourg (2018)
Google Scholar
Jimenez, M., Rwemalika, R., Papadakis, M., Sarro, F., Le Traon, Y., Harman, M.: The importance of accounting for real-world labelling when predicting software vulnerabilities. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 695–705 (2019)
Google Scholar
Jin, Z., Yu, Y.: Current and future research of machine learning based vulnerability detection. In: 2018 Eighth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC), pp. 1562–1566 (2018). https://doi.org/10.1109/IMCCC.2018.00322
Li, Z., et al.: VulDeePecker: a deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681 (2018)
Morgan, S.: Is poor software development the biggest cyber threat (2015). https://www.csoonline.com/article/2978858
Pang, Y., Xue, X., Namin, A.S.: Predicting vulnerable software components through N-gram analysis and statistical feature selection. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 543–548 (2015). https://doi.org/10.1109/ICMLA.2015.99
Pereira, J.D., Vieira, M.: On the use of open-source C/C++ static analysis tools in large projects. In: 2020 16th European Dependable Computing Conference (EDCC), pp. 97–102. IEEE (2020). https://doi.org/10.1109/EDCC51268.2020.00025
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Google Scholar
Russell, R., et al.: Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 757–762. IEEE (2018)
Google Scholar
Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)
Article Google Scholar
Senanayake, J., Kalutarage, H., Al-Kadri, M.O.: Android mobile malware detection using machine learning: a systematic review. Electronics 10(13) (2021). https://doi.org/10.3390/electronics10131606. https://www.mdpi.com/2079-9292/10/13/1606
Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Android source code vulnerability detection: a systematic literature review. ACM Comput. Surv. (2022). https://doi.org/10.1145/3556974, just Accepted
Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Developing secured android applications by mitigating code vulnerabilities with machine learning. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, ASIA CCS 2022, pp. 1255–1257. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3488932.3527290
Tahaei, M., Vaniea, K.: A survey on developer-centred security. In: 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS &PW), pp. 129–138 (2019). https://doi.org/10.1109/EuroSPW.2019.00021
Wile, D.S.: Abstract syntax from concrete syntax. In: Proceedings of the 19th International Conference on Software Engineering, pp. 472–480 (1997)
Google Scholar
Xie, J., Lipford, H.R., Chu, B.: Why do programmers make security errors? In: 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 161–164 (2011). https://doi.org/10.1109/VLHCC.2011.6070393
Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 359–368 (2012)
Google Scholar
Zeng, P., Lin, G., Pan, L., Tai, Y., Zhang, J.: Software vulnerability analysis and discovery using deep learning techniques: a survey. IEEE Access (2020)
Google Scholar
Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: NeurIPS (2019)
Google Scholar

Download references

Acknowledgment

This work has been funded by The Scottish Funding Council, we are thankful to the funder for their support.

Author information

Authors and Affiliations

School of Computing, Robert Gordon University, Aberdeen, AB10 7QB, UK
Sampath Rajapaksha, Janaka Senanayake & Harsha Kalutarage
School of Computing and Digital Technology, Birmingham City University, Birmingham, B5 5JU, UK
Mhd Omar Al-Kadri

Authors

Sampath Rajapaksha
View author publications
You can also search for this author in PubMed Google Scholar
Janaka Senanayake
View author publications
You can also search for this author in PubMed Google Scholar
Harsha Kalutarage
View author publications
You can also search for this author in PubMed Google Scholar
Mhd Omar Al-Kadri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sampath Rajapaksha .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giampaolo Bella
Bucharest University of Economic Studies, Bucharest, Romania
Mihai Doinea
Edith Cowan University, Joondalup, WA, Australia
Helge Janicke

Appendix: Common Weaknesses in C/C++ Source Code

CWE-ID	CWE-Name	Sample Vulnerable C/C++ Code
CWE-20	Improper Input Validation	board = (board_square_t) malloc(m n * sizeof(board_square_t));
CWE-78	Improper Neutralization of Special Elements used in an OS Command (‘OS Command Injection’)	system(NULL)
CWE-120	Buffer Copy without Checking Size of Input (‘Classic Buffer Overflow’)	strcpy(buf, string);
CWE-126	Buffer Over-read	strncpy(Filename, argv[1], sizeof(Filename));
CWE-134	Use of Externally-Controlled Format String	snprintf(buf, 128, argv[1]);
CWE-190	Integer Overflow or Wraparound response	xmalloc(nrespsizeof(char));
CWE-327	Use of a Broken or Risky Cryptographic Algorithm	EVP_des_ecb();
CWE-362	Concurrent Execution using Shared Resource with Improper Synchronization (‘Race Condition’)	pthread_mutex_lock(mutex);
CWE-401	Missing Release of Memory after Effective Lifetime	char buf = (char) malloc(BLOCK_SIZE); read(fd, buf, BLOCK_SIZE) != BLOCK_SIZE;
CWE-457	Use of Uninitialized Variable	char *test_string; if (i != err_val) test_string = “Hello World!”; printf(“%s”, test_string);
CWE-676	Use of Potentially Dangerous Function	char buf[24]; strcpy(buf, string);

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajapaksha, S., Senanayake, J., Kalutarage, H., Al-Kadri, M.O. (2023). AI-Powered Vulnerability Detection for Secure Source Code Development. In: Bella, G., Doinea, M., Janicke, H. (eds) Innovative Security Solutions for Information Technology and Communications. SecITC 2022. Lecture Notes in Computer Science, vol 13809. Springer, Cham. https://doi.org/10.1007/978-3-031-32636-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-32636-3_16
Published: 12 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32635-6
Online ISBN: 978-3-031-32636-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

AI-Powered Vulnerability Detection for Secure Source Code Development

Abstract

Access this chapter

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Common Weaknesses in C/C++ Source Code

Appendix: Common Weaknesses in C/C++ Source Code

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation