Skip to main content

Enhancing Security Assurance in Software Development: AI-Based Vulnerable Code Detection with Static Analysis

  • Conference paper
  • First Online:
Computer Security. ESORICS 2023 International Workshops (ESORICS 2023)

Abstract

The presence of vulnerable source code in software applications is causing significant reliability and security issues, which can be mitigated by integrating and assuring software security principles during the early stages of the development lifecycle. One promising approach to identifying vulnerabilities in source code is the use of Artificial Intelligence (AI). This research proposes an AI-based method for detecting source code vulnerabilities and leverages Explainable AI to help developers identify and understand vulnerable source code tokens. To train the model, a web crawler was used to collect a real-world dataset of 600,000 source code samples, which were annotated using static analysers. Several ML classifiers were tested on a feature vector generated using Natural Language Processing techniques. The Random Forest and Extreme Gradient Boosting classifiers were found to perform well in binary and multi-class approaches, respectively. The proposed model achieved a 0.96 F1-Score in binary classification and a 0.85 F1-Score in multi-class classification based on Common Weakness Enumeration (CWE) IDs. The model, trained on a dataset of actual source codes, is highly generalisable and has been integrated into a live web portal to validate its performance on real-world code vulnerabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://cwe.mitre.org.

  2. 2.

    https://www.cvedetails.com.

  3. 3.

    https://tree-sitter.github.io/tree-sitter.

  4. 4.

    https://cppcheck.sourceforge.io.

  5. 5.

    https://github.com/david-a-wheeler/flawfinder.

  6. 6.

    https://github.com/eliben/pycparser.

  7. 7.

    https://www.microfocus.com/en-us/cyberres/application-security/fortify-languages.

  8. 8.

    https://www.synopsys.com/software-integrity/security-testing/static-analysis-sast/coverity-cwe.html.

  9. 9.

    https://samate.nist.gov/SARD/.

  10. 10.

    https://www.nist.gov/itl/ssd/software-quality-group/static-analysis-tool-exposition-sate-iv.

  11. 11.

    https://github.com/eliben/pycparser.

  12. 12.

    https://github.com/marcotcr/lime.

References

  1. Barredo Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012, https://www.sciencedirect.com/science/article/pii/S1566253519308103

  2. Bilgin, Z., Ersoy, M.A., Soykan, E.U., Tomur, E., Çomak, P., Karaçay, L.: Vulnerability prediction from source code using machine learning. IEEE Access 8, 150672–150684 (2020)

    Article  Google Scholar 

  3. Dam, H.K., Tran, T., Pham, T., Ng, S.W., Grundy, J., Ghose, A.: Automatic feature learning for vulnerability prediction. arXiv preprint arXiv:1708.02368 (2017)

  4. Du, X., et al.: Leopard: identifying vulnerable code for vulnerability assessment through program metrics. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 60–71. IEEE (2019)

    Google Scholar 

  5. Feng, H., Fu, X., Sun, H., Wang, H., Zhang, Y.: Efficient vulnerability detection based on abstract syntax tree and deep learning. In: IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 722–727 (2020). https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9163061

  6. Fujdiak, R., et al.: Managing the secure software development. In: 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–4 (2019). https://doi.org/10.1109/NTMS.2019.8763845

  7. Grieco, G., Grinblat, G.L., Uzal, L., Rawat, S., Feist, J., Mounier, L.: Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 85–96 (2016)

    Google Scholar 

  8. Harer, J.A., et al.: Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497 (2018)

  9. Jimenez, M.: Evaluating vulnerability prediction models (2018). https://orbilu.uni.lu/handle/10993/36869

  10. Pereira, J.D., Vieira, M.: On the use of open-source C/C++ static analysis tools in large projects. In: 2020 16th European Dependable Computing Conference (EDCC), pp. 97–102. IEEE (2020). https://doi.org/10.1109/EDCC51268.2020.00025

  11. Pimpalkar, A.P., Retna Raj, R.J.: Influence of pre-processing strategies on the performance of ML classifiers exploiting tf-idf and bow features. ADCAIJ: Adv. Distrib. Comput. Artif. Intell. J. 9(2), 49–68 (2020). https://doi.org/10.14201/ADCAIJ2020924968

  12. Rajapaksha, S., Senanayake, J., Kalutarage, H., Al-Kadri, M.O.: Ai-powered vulnerability detection for secure source code development. In: Bella, G., Doinea, M., Janicke, H. (eds.) SecITC 2022. LNCS, vol. 13809, pp. 275–288. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-32636-3_16

    Chapter  Google Scholar 

  13. Renaud, K.: Human-centred cyber secure software engineering. Zeitschrift für Arbeitswissenschaft, pp. 1–11 (2022)

    Google Scholar 

  14. Russell, R., et al.: Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 757–762. IEEE (2018)

    Google Scholar 

  15. Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)

    Article  Google Scholar 

  16. Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Developing secured android applications by mitigating code vulnerabilities with machine learning. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. ASIA CCS ’22, pp. 1255–1257. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3488932.3527290

  17. Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Android code vulnerabilities early detection using AI-powered ACVED plugin. In: Atluri, V., Ferrara, A.L. (eds.) DBSec 2023. LNCS, vol. 13942, pp. 1–19. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-37586-6_20

    Chapter  Google Scholar 

  18. Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Android source code vulnerability detection: a systematic literature review. ACM Comput. Surv. 55(9) (2023). https://doi.org/10.1145/3556974

  19. de Vicente Mohino, J., Bermejo Higuera, J., Bermejo Higuera, J.R., Sicilia Montalvo, J.A.: The application of a new secure software development life cycle (S-SDLC) with agile methodologies. Electronics 8(11) (2019). https://doi.org/10.3390/electronics8111218

  20. Votipka, D., Fulton, K.R., Parker, J., Hou, M., Mazurek, M.L., Hicks, M.: Understanding security mistakes developers make: qualitative analysis from build it, break it, fix it. In: 29th USENIX Security Symposium (USENIX Security 20), pp. 109–126. USENIX Association, August 2020

    Google Scholar 

  21. Zeng, P., Lin, G., Pan, L., Tai, Y., Zhang, J.: Software vulnerability analysis and discovery using deep learning techniques: a survey. IEEE Access (2020)

    Google Scholar 

  22. Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: NeurIPS (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sampath Rajapaksha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rajapaksha, S., Senanayake, J., Kalutarage, H., Al-Kadri, M.O. (2024). Enhancing Security Assurance in Software Development: AI-Based Vulnerable Code Detection with Static Analysis. In: Katsikas, S., et al. Computer Security. ESORICS 2023 International Workshops. ESORICS 2023. Lecture Notes in Computer Science, vol 14399. Springer, Cham. https://doi.org/10.1007/978-3-031-54129-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-54129-2_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-54128-5

  • Online ISBN: 978-3-031-54129-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics