Enhancing Security Assurance in Software Development: AI-Based Vulnerable Code Detection with Static Analysis

Rajapaksha, Sampath; Senanayake, Janaka; Kalutarage, Harsha; Al-Kadri, Mhd Omar

doi:10.1007/978-3-031-54129-2_20

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14399))

Included in the following conference series:

European Symposium on Research in Computer Security

215 Accesses

Abstract

The presence of vulnerable source code in software applications is causing significant reliability and security issues, which can be mitigated by integrating and assuring software security principles during the early stages of the development lifecycle. One promising approach to identifying vulnerabilities in source code is the use of Artificial Intelligence (AI). This research proposes an AI-based method for detecting source code vulnerabilities and leverages Explainable AI to help developers identify and understand vulnerable source code tokens. To train the model, a web crawler was used to collect a real-world dataset of 600,000 source code samples, which were annotated using static analysers. Several ML classifiers were tested on a feature vector generated using Natural Language Processing techniques. The Random Forest and Extreme Gradient Boosting classifiers were found to perform well in binary and multi-class approaches, respectively. The proposed model achieved a 0.96 F1-Score in binary classification and a 0.85 F1-Score in multi-class classification based on Common Weakness Enumeration (CWE) IDs. The model, trained on a dataset of actual source codes, is highly generalisable and has been integrated into a live web portal to validate its performance on real-world code vulnerabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Barredo Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012, https://www.sciencedirect.com/science/article/pii/S1566253519308103
Bilgin, Z., Ersoy, M.A., Soykan, E.U., Tomur, E., Çomak, P., Karaçay, L.: Vulnerability prediction from source code using machine learning. IEEE Access 8, 150672–150684 (2020)
Article Google Scholar
Dam, H.K., Tran, T., Pham, T., Ng, S.W., Grundy, J., Ghose, A.: Automatic feature learning for vulnerability prediction. arXiv preprint arXiv:1708.02368 (2017)
Du, X., et al.: Leopard: identifying vulnerable code for vulnerability assessment through program metrics. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 60–71. IEEE (2019)
Google Scholar
Feng, H., Fu, X., Sun, H., Wang, H., Zhang, Y.: Efficient vulnerability detection based on abstract syntax tree and deep learning. In: IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 722–727 (2020). https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9163061
Fujdiak, R., et al.: Managing the secure software development. In: 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–4 (2019). https://doi.org/10.1109/NTMS.2019.8763845
Grieco, G., Grinblat, G.L., Uzal, L., Rawat, S., Feist, J., Mounier, L.: Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 85–96 (2016)
Google Scholar
Harer, J.A., et al.: Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497 (2018)
Jimenez, M.: Evaluating vulnerability prediction models (2018). https://orbilu.uni.lu/handle/10993/36869
Pereira, J.D., Vieira, M.: On the use of open-source C/C++ static analysis tools in large projects. In: 2020 16th European Dependable Computing Conference (EDCC), pp. 97–102. IEEE (2020). https://doi.org/10.1109/EDCC51268.2020.00025
Pimpalkar, A.P., Retna Raj, R.J.: Influence of pre-processing strategies on the performance of ML classifiers exploiting tf-idf and bow features. ADCAIJ: Adv. Distrib. Comput. Artif. Intell. J. 9(2), 49–68 (2020). https://doi.org/10.14201/ADCAIJ2020924968
Rajapaksha, S., Senanayake, J., Kalutarage, H., Al-Kadri, M.O.: Ai-powered vulnerability detection for secure source code development. In: Bella, G., Doinea, M., Janicke, H. (eds.) SecITC 2022. LNCS, vol. 13809, pp. 275–288. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-32636-3_16
Chapter Google Scholar
Renaud, K.: Human-centred cyber secure software engineering. Zeitschrift für Arbeitswissenschaft, pp. 1–11 (2022)
Google Scholar
Russell, R., et al.: Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 757–762. IEEE (2018)
Google Scholar
Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)
Article Google Scholar
Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Developing secured android applications by mitigating code vulnerabilities with machine learning. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. ASIA CCS ’22, pp. 1255–1257. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3488932.3527290
Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Android code vulnerabilities early detection using AI-powered ACVED plugin. In: Atluri, V., Ferrara, A.L. (eds.) DBSec 2023. LNCS, vol. 13942, pp. 1–19. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-37586-6_20
Chapter Google Scholar
Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Android source code vulnerability detection: a systematic literature review. ACM Comput. Surv. 55(9) (2023). https://doi.org/10.1145/3556974
de Vicente Mohino, J., Bermejo Higuera, J., Bermejo Higuera, J.R., Sicilia Montalvo, J.A.: The application of a new secure software development life cycle (S-SDLC) with agile methodologies. Electronics 8(11) (2019). https://doi.org/10.3390/electronics8111218
Votipka, D., Fulton, K.R., Parker, J., Hou, M., Mazurek, M.L., Hicks, M.: Understanding security mistakes developers make: qualitative analysis from build it, break it, fix it. In: 29th USENIX Security Symposium (USENIX Security 20), pp. 109–126. USENIX Association, August 2020
Google Scholar
Zeng, P., Lin, G., Pan, L., Tai, Y., Zhang, J.: Software vulnerability analysis and discovery using deep learning techniques: a survey. IEEE Access (2020)
Google Scholar
Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: NeurIPS (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Robert Gordon University, Aberdeen, AB10 7QB, UK
Sampath Rajapaksha, Janaka Senanayake & Harsha Kalutarage
University of Doha for Science and Technology, Doha, Qatar
Mhd Omar Al-Kadri

Authors

Sampath Rajapaksha
View author publications
You can also search for this author in PubMed Google Scholar
Janaka Senanayake
View author publications
You can also search for this author in PubMed Google Scholar
Harsha Kalutarage
View author publications
You can also search for this author in PubMed Google Scholar
Mhd Omar Al-Kadri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sampath Rajapaksha .

Editor information

Editors and Affiliations

Norwegian University of Science and Technology, Gjøvik, Norway
Sokratis Katsikas
Norwegian Computing Center, Oslo, Norway
Habtamu Abie
University of Trento, Trento, Italy
Silvio Ranise
University of Genoa, Genoa, Italy
Luca Verderame
Consiglio Nazionale delle Ricerche (CNR), Genoa, Italy
Enrico Cambiaso
SINTEF A.S., Oslo, Norway
Rita Ugarelli
Instituto Superior de Engenharia do Porto, Porto, Portugal
Isabel Praça
Hong Kong Polytechnic University, Hong Kong, China
Wenjuan Li
Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng
University of Nottingham, Nottingham, UK
Steven Furnell
Norwegian University of Science and Technology, Gjøvik, Norway
Basel Katt
Norwegian Computing Center, Oslo, Norway
Sandeep Pirbhulal
Institute for Energy Technology (IFE), Halden, Norway
Ankur Shukla
University of Calabria, Rende, Italy
Michele Ianni
University of Verona, Verona, Italy
Mila Dalla Preda
The University of Texas at San Antonio, San Antonio, TX, USA
Kim-Kwang Raymond Choo
University of Lisbon, Lisbon, Portugal
Miguel Pupo Correia
University of Twente, Enschede, The Netherlands
Abhishta Abhishta
University of Amsterdam, Amsterdam, The Netherlands
Giovanni Sileno
Open University in the Netherlands, Heerlen, The Netherlands
Mina Alishahi
Robert Gordon University, Aberdeen, UK
Harsha Kalutarage
Osaka University, Osaka, Japan
Naoto Yanai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajapaksha, S., Senanayake, J., Kalutarage, H., Al-Kadri, M.O. (2024). Enhancing Security Assurance in Software Development: AI-Based Vulnerable Code Detection with Static Analysis. In: Katsikas, S., et al. Computer Security. ESORICS 2023 International Workshops. ESORICS 2023. Lecture Notes in Computer Science, vol 14399. Springer, Cham. https://doi.org/10.1007/978-3-031-54129-2_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-54129-2_20
Published: 12 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54128-5
Online ISBN: 978-3-031-54129-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Security Assurance in Software Development: AI-Based Vulnerable Code Detection with Static Analysis