Skip to main content
Log in

A security vulnerability predictor based on source code metrics

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

Detecting security vulnerabilities in the source code of software systems is one of the most important challenges in the field of software security. We need an effective solution to discover and patch vulnerabilities before our valuable information is compromised. Security testing is a type of software testing that checks whether software is vulnerable to cyber attacks. This study aimed to pursue three main objectives: (1) The first goal is to identify the vulnerable functions of a C/C++ software program based on code metrics. This can reduce the cost of software security testing and also redirect the related activities to identified vulnerable functions rather than to the entire software, (2) The second goal is to identify the type of attack related to the vulnerability function, and (3) Finally, the ultimate goal is to analyze the relationship between code metrics and the vulnerabilities. This goal can help us understand which code structure is most likely to contain vulnerable code. This paper first aimed to create a comprehensive view of the source code of the target software using graph concepts. Second, a set of source code metrics and calculated by crawling on the related graph using the static analysis approach. Finally, the vulnerability prediction model presented in this paper is based on machine learning technique applied on metrics extracted from program source code. Compared to previous work, new achievements have been made in this paper. One of the most important ones is the very high accuracy detection of the proposed model in detecting the type of vulnerability. Moreover, 15 code metrics are used to predict vulnerabilities. Our analysis on feature importance indicates that what structure the software program code has, most likely, it will be vulnerable. Experimental results in 10 real projects (OpenSSL, SQLite, FreeType, LibTiff, Libxslt, Binutils, FFmpeg, ImageMagick, OpenSC, and rdesktop) indicated that the security testing predictor proposed in this paper could predict on average 89% of the really vulnerable functions of the source code and 86% of the vulnerability type of the detected functions correctly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://github.com/puya-pakshad/VulnerabilityDataset.git.

  2. https://dwheeler.com/flawfinder/.

  3. https://code.google.com/archive/p/rough-auditing-tool-for-security/.

  4. http://deployingradius.com/pscan/.

  5. Common Vulnerabilities and Exposures.

  6. National Vulnerability Database.

References

  1. Garousi, V., Mäntylä, M.V.: A systematic literature review of literature reviews in software testing. Inf. Softw. Technol. 80, 195–216 (2016)

    Article  Google Scholar 

  2. ISO/IEC/IEEE International Standard. Software and systems engineering—software testing—part 1: concepts and definitions. In: ISO/IEC/IEEE 29119-1:2013(E), pp. 1–64. IEEE (2013)

  3. Homaei, H., Shahriari, H.R.: Athena: a framework to automatically generate security test oracle via extracting policies from source code and intended software behaviour. Inf. Softw. Technol. 107, 112–124 (2019)

    Article  Google Scholar 

  4. Shameli-Sendi, A., Aghababaei-Barzegar, R., Cheriet, M.: Taxonomy of information security risk assessment (ISRA). Comput. Secur. 57, 14–30 (2016)

    Article  Google Scholar 

  5. Telang, R., Wattal, S.: An empirical analysis of the impact of software vulnerability announcements on firm stock price. IEEE Trans. Softw. Eng. 33(8), 544–557 (2007)

    Article  Google Scholar 

  6. Shah, S., Mehtre, B.M.: An overview of vulnerability assessment and penetration testing techniques. J. Comput. Virol. Hack. Tech. 11(1), 27–49 (2015)

    Article  Google Scholar 

  7. Bishop, M., Goldman, E.: The strategy and tactics of information warfare. Contemp. Secur. Policy 24(1), 113–139 (2003)

    Article  Google Scholar 

  8. Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)

    Article  Google Scholar 

  9. Ghaffarian, S.M., Shahriari, H.R.: Software vulnerability analysis and discovery using machine-learning and data-mining techniques: a survey. ACM Comput. Surv. (CSUR) 50(4), 1–36 (2017)

    Article  Google Scholar 

  10. Myasnikov, A.G., Rybalov, A.N.: Generic complexity of undecidable problems. J. Symb. Log. 73(2), 656–673 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  11. Burkholder, L.: The halting problem. ACM SIGACT News 18(3), 48–60 (1987)

    Article  Google Scholar 

  12. Du, X., Chen, B., Li, Y., Guo, J., Zhou, Y., Liu, Y., Jiang, Y.: Leopard: Identifying vulnerable code for vulnerability assessment through program metrics. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 60–71. IEEE (2019)

  13. Meneely, A., Corcoran, M., Williams, L.: Improving developer activity metrics with issue tracking annotations. In: Proceedings of the 2010 ICSE Workshop on Emerging Trends in Software Metrics, pp. 75–80 (2010)

  14. Yamaguchi, F., Golde, N., Arp, D., Rieck, K.: Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE Symposium on Security and Privacy, pp. 590–604. IEEE (2014)

  15. Votipka, D., Stevens, R., Redmiles, E., Hu, J., Mazurek, M.: Hackers vs. testers: a comparison of software vulnerability discovery processes. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 374-391. IEEE (2018)

  16. Liu, B., Shi, L., Cai, Z., Li, M.: Software vulnerability discovery techniques: a survey. In: 2012 Fourth International Conference on Multimedia Information Networking and Security, pp. 152–156. IEEE (2012)

  17. Damodaran, A., Troia, F.D., Visaggio, C.A., Austin, T.H., Stamp, M.: A comparison of static, dynamic, and hybrid analysis for malware detection. J. Comput. Virol. Hack. Tech. 13(1), 1–12 (2017)

    Article  Google Scholar 

  18. Annachhatre, C., Austin, T.H., Stamp, M.: Hidden Markov models for malware classification. J. Comput. Virol. Hack. Tech. 11(2), 59–73 (2015)

    Article  Google Scholar 

  19. Baysa, D., Low, R.M., Stamp, M.: Structural entropy and metamorphic malware. J. Comput. Virol. Hack. Tech. 9(4), 179–192 (2013)

    Article  Google Scholar 

  20. Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: 12th USENIX Security Symposium (USENIX Security 03) (2003)

  21. Engler, D., Chen, D.Y., Hallem, S., Chou, A., Chelf, B.: Bugs as deviant behavior: a general approach to inferring errors in systems code. ACM SIGOPS Oper. Syst. Rev. 35(5), 57–72 (2001)

    Article  Google Scholar 

  22. Medeiros, I., Neves, N., Correia, M.: DEKANT: a static analysis tool that learns to detect web application vulnerabilities. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, pp. 1–11 (2016)

  23. Yi, L., Junbin, H.: A dynamic detection method based on Web crawler and page code behavior for XSS vulnerability. Telecommun. Sci. 32(3), 87 (2016)

    Google Scholar 

  24. Shuai, B., Li, M., Li, H., Zhang, Q., Tang, C.: Software vulnerability detection using genetic algorithm and dynamic taint analysis. In: 2013 3rd International Conference on Consumer Electronics, Communications and Networks, pp. 589–593. IEEE (2013)

  25. Li, X., Wang, L., Xin, Y., Yang, Y., Tang, Q., Chen, Y.: Automated software vulnerability detection based on hybrid neural network. Appl. Sci. 11(7), 3201 (2021)

    Article  Google Scholar 

  26. Shar, L.K., Tan, H.B.K., Briand, L.C.: Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 642–651. IEEE (2013)

  27. Muiruri, C.K.: A hybrid algorithm for detecting web-based applications vulnerabilities (Doctoral dissertation, University of Nairobi) (2015)

  28. Chowdhury, I., Zulkernine, M.: Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. J. Syst. Archit. 57(3), 294–313 (2011)

    Article  Google Scholar 

  29. Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S., Deng, Z., Zhong, Y.: Vuldeepecker: a deep learning-based system for vulnerability detection. In: Proceedings of the NDSS (2018)

  30. Zou, D., Wang, S., Xu, S., Li, Z., Jin, H.: \(\mu \)VulDeePecker: a deep learning-based system for multiclass vulnerability detection. IEEE Trans. Dependable Secur. Comput. 18(5), 2224–2236 (2019)

    Google Scholar 

  31. Lin, G., Zhang, J., Luo, W., Pan, L., De Vel, O., Montague, P., Xiang, Y.: Software vulnerability discovery via learning multi-domain knowledge bases. IEEE Trans. Dependable Secur. Comput. 18(5), 2469–2485 (2019)

    Article  Google Scholar 

  32. Pradel, M., Sen, K.: Deepbugs: a learning approach to name-based bug detection. In: Proceedings of the ACM on Programming Languages, 2(OOPSLA), pp. 1–25 (2018)

  33. Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

  34. Wang, H., Ye, G., Tang, Z., Tan, S.H., Huang, S., Fang, D., Feng, Y., Bian, L., Wang, Z.: Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans. Inf. Forensics Secur. 16, 1943–1958 (2020)

    Article  Google Scholar 

  35. Austin, A., Williams, L.: One technique is not enough: a comparison of vulnerability discovery techniques. In: 2011 International Symposium on Empirical Software Engineering and Measurement, pp. 97–106. IEEE (2011)

  36. Vanegue, J., Lahiri, S.K.: Towards practical reactive security audit using extended static checkers. In: 2013 IEEE Symposium on Security and Privacy, pp. 33–47. IEEE (2013)

  37. Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 359–368 (2012)

  38. Yamaguchi, F., Wressnegger, C., Gascon, H., Rieck, K.: Chucky: exposing missing checks in source code for vulnerability discovery. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 499–510 (2013)

  39. Son, S., McKinley, K.S., Shmatikov, V.: Rolecast: finding missing security checks when you do not know what checks are. In: Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, pp. 1069–1084 (2011)

  40. Tan, L., Zhang, X., Ma, X., Xiong, W., Zhou, Y.: AutoISES: automatically inferring security specification and detecting violations. In: USENIX Security Symposium, pp. 379–394 (2008)

  41. Yamaguchi, F., Maier, A., Gascon, H., Rieck, K.: Automatic inference of search patterns for taint-style vulnerabilities. In: 2015 IEEE Symposium on Security and Privacy, pp. 797–812. IEEE (2015)

  42. Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38(6), 1276–1304 (2011)

    Article  Google Scholar 

  43. Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Appl. Soft Comput. 27, 504–518 (2015)

    Article  Google Scholar 

  44. Radjenović, D., Heričko, M., Torkar, R., Živkovič, A.: Software fault prediction metrics: a systematic literature review. Inf. Softw. Technol. 55(8), 1397–1418 (2013)

    Article  Google Scholar 

  45. Evans, D., Larochelle, D.: Improving security using extensible lightweight static analysis. IEEE Softw. 19(1), 42–51 (2002)

    Article  Google Scholar 

  46. Yamashita, K., Huang, C., Nagappan, M., Kamei, Y., Mockus, A., Hassan, A. E., Ubayashi, N.: Thresholds for size and complexity metrics: a case study from the perspective of defect density. In: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 191–201. IEEE (2016)

  47. Qin, F., Zheng, Z., Qiao, Y., Trivedi, K.S.: Studying aging-related bug prediction using cross-project models. IEEE Trans. Reliab. 68(3), 1134–1153 (2018)

    Article  Google Scholar 

  48. Dahse, J., Holz, T.: Simulation of built-in PHP features for precise static code analysis. In: NDSS, vol. 14, pp. 23–26 (2014)

  49. Shin, Y., Williams, L.: An empirical model to predict security vulnerabilities using code complexity metrics. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 315–317 (2008)

  50. Shin, Y., Williams, L.: An initial study on the use of execution complexity metrics as indicators of software vulnerabilities. In: Proceedings of the 7th International Workshop on Software Engineering for Secure Systems, pp. 1–7 (2011)

  51. Gegick, M., Williams, L., Osborne, J., Vouk, M.: Prioritizing software security fortification throughcode-level metrics. In: Proceedings of the 4th ACM Workshop on Quality of Protection, pp. 31–38 (2008)

  52. Neuhaus, S., Zimmermann, T., Holler, C., Zeller, A.: Predicting vulnerable software components. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 529–540 (2007)

  53. Nguyen, V. H., Tran, L.M.S.: Predicting vulnerable software components with dependency graphs. In: Proceedings of the 6th International Workshop on Security Measurements and Metrics, pp. 1–8 (2010)

  54. Shin, Y., Williams, L.: Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18(1), 25–59 (2013)

    Article  Google Scholar 

  55. Morrison, P., Herzig, K., Murphy, B., Williams, L.: Challenges with applying vulnerability prediction models. In: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, pp. 1–9 (2015)

  56. Hovsepyan, A., Scandariato, R., Joosen, W.: Is newer always better? The case of vulnerability prediction models. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 1–6 (2016)

  57. Sarıman, G., Kucuksille, E.U.: A novel approach to determine software security level using bayes classifier via static code metrics. Elektron. Elektrotech. 22(2), 73–80 (2016)

    Article  Google Scholar 

  58. Camilo, F., Meneely, A., Nagappan, M.: Do bugs foreshadow vulnerabilities? A study of the chromium project. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 269–279. IEEE (2015)

  59. Shar, L.K., Tan, H.B.K.: Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns. Inf. Softw. Technol. 55(10), 1767–1780 (2013)

    Article  Google Scholar 

  60. Grieco, G., Grinblat, G.L., Uzal, L., Rawat, S., Feist, J., Mounier, L.: Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 85–96 (2016)

  61. Kim, S., Woo, S., Lee, H., Oh, H.: Vuddy: a scalable approach for vulnerable code clone discovery. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 595–614. IEEE (2017)

  62. Nembhard, F., Carvalho, M., Eskridge, T.: Extracting knowledge from open source projects to improve program security. In: SoutheastCon 2018, pp. 1–7. IEEE (2018)

  63. Gupta, A., Suri, B., Kumar, V., Jain, P.: Extracting rules for vulnerabilities detection with static metrics using machine learning. Int. J. Syst. Assur. Eng. Manag. 12(1), 65–76 (2021)

    Article  Google Scholar 

  64. Li, Z., Zou, D., Xu, S., Chen, Z., Zhu, Y., Jin, H.: Vuldeelocator: a deep learning-based fine-grained vulnerability detector. IEEE Trans. Dependable Secur. Comput. 19, 2821–2837 (2021)

    Article  Google Scholar 

  65. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers, Principles, Techniques, vol. 7, p. 9. Addison Wesley, Reading (1986)

    MATH  Google Scholar 

  66. Moonen, L.: Generating robust parsers using island grammars. In: Proceedings Eighth Working Conference on Reverse Engineering, pp. 13–22. IEEE (2001)

  67. Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. (TOPLAS) 9(3), 319–349 (1987)

    Article  MATH  Google Scholar 

  68. Yamaguchi, F.: Pattern-based vulnerability discovery (Doctoral Dissertation, Niedersächsische Staats-und Universitätsbibliothek Göttingen) (2015)

  69. https://joern.io/

  70. Madsen, M., Livshits, B., Fanning, M.: Practical static analysis of JavaScript applications in the presence of frameworks and libraries. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pp. 499–509 (2013)

  71. Zitser, M., Lippmann, R., Leek, T.: Testing static analysis tools using exploitable buffer overflows from open source code. In: Proceedings of the 12th ACM SIGSOFT Twelfth International Symposium on Foundations of Software Engineering, pp. 97–106 (2004)

  72. McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 4, 308–320 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  73. Szekeres, L., Payer, M., Wei, T., Song, D.: Sok: eternal war in memory. In: 2013 IEEE Symposium on Security and Privacy, pp. 48–62. IEEE (2013)

  74. Manikandan, G., Abirami, S.: Feature selection is important: state-of-the-art methods and application domains of feature selection on high-dimensional data. In: Applications in Ubiquitous Computing, pp. 177–196 (2021)

  75. Biswas, P., Di Federico, A., Carr, S. A., Rajasekaran, P., Volckaert, S., Na, Y., Payer, M.: Venerable variadic vulnerabilities vanquished. In: 26th USENIX Security Symposium (USENIX Security 17), pp. 186–198 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alireza Shameli-Sendi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 9 shows the information about the hyperparameters of all the machine learning techniques used in this paper. It should be noted that the hyperparameters were obtained using the Grid Search method.

Table 9 The hyperparameters of all the machine learning techniques used in this paper

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pakshad, P., Shameli-Sendi, A. & Khalaji Emamzadeh Abbasi, B. A security vulnerability predictor based on source code metrics. J Comput Virol Hack Tech 19, 615–633 (2023). https://doi.org/10.1007/s11416-023-00469-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-023-00469-y

Keywords

Navigation