A security vulnerability predictor based on source code metrics

Pakshad, Puya; Shameli-Sendi, Alireza; Khalaji Emamzadeh Abbasi, Behzad

doi:10.1007/s11416-023-00469-y

A security vulnerability predictor based on source code metrics

Original Paper
Published: 17 February 2023

Volume 19, pages 615–633, (2023)
Cite this article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Puya Pakshad¹,
Alireza Shameli-Sendi ORCID: orcid.org/0000-0002-6250-8577¹ &
Behzad Khalaji Emamzadeh Abbasi¹

430 Accesses
2 Citations
Explore all metrics

Abstract

Detecting security vulnerabilities in the source code of software systems is one of the most important challenges in the field of software security. We need an effective solution to discover and patch vulnerabilities before our valuable information is compromised. Security testing is a type of software testing that checks whether software is vulnerable to cyber attacks. This study aimed to pursue three main objectives: (1) The first goal is to identify the vulnerable functions of a C/C++ software program based on code metrics. This can reduce the cost of software security testing and also redirect the related activities to identified vulnerable functions rather than to the entire software, (2) The second goal is to identify the type of attack related to the vulnerability function, and (3) Finally, the ultimate goal is to analyze the relationship between code metrics and the vulnerabilities. This goal can help us understand which code structure is most likely to contain vulnerable code. This paper first aimed to create a comprehensive view of the source code of the target software using graph concepts. Second, a set of source code metrics and calculated by crawling on the related graph using the static analysis approach. Finally, the vulnerability prediction model presented in this paper is based on machine learning technique applied on metrics extracted from program source code. Compared to previous work, new achievements have been made in this paper. One of the most important ones is the very high accuracy detection of the proposed model in detecting the type of vulnerability. Moreover, 15 code metrics are used to predict vulnerabilities. Our analysis on feature importance indicates that what structure the software program code has, most likely, it will be vulnerable. Experimental results in 10 real projects (OpenSSL, SQLite, FreeType, LibTiff, Libxslt, Binutils, FFmpeg, ImageMagick, OpenSC, and rdesktop) indicated that the security testing predictor proposed in this paper could predict on average 89% of the really vulnerable functions of the source code and 86% of the vulnerability type of the detected functions correctly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An overview of vulnerability assessment and penetration testing techniques

Article 28 November 2014

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Article 08 April 2024

Malware Detection Using Machine Learning and Deep Learning

Notes

https://github.com/puya-pakshad/VulnerabilityDataset.git.
https://dwheeler.com/flawfinder/.
https://code.google.com/archive/p/rough-auditing-tool-for-security/.
http://deployingradius.com/pscan/.
Common Vulnerabilities and Exposures.
National Vulnerability Database.

References

Garousi, V., Mäntylä, M.V.: A systematic literature review of literature reviews in software testing. Inf. Softw. Technol. 80, 195–216 (2016)
Article Google Scholar
ISO/IEC/IEEE International Standard. Software and systems engineering—software testing—part 1: concepts and definitions. In: ISO/IEC/IEEE 29119-1:2013(E), pp. 1–64. IEEE (2013)
Homaei, H., Shahriari, H.R.: Athena: a framework to automatically generate security test oracle via extracting policies from source code and intended software behaviour. Inf. Softw. Technol. 107, 112–124 (2019)
Article Google Scholar
Shameli-Sendi, A., Aghababaei-Barzegar, R., Cheriet, M.: Taxonomy of information security risk assessment (ISRA). Comput. Secur. 57, 14–30 (2016)
Article Google Scholar
Telang, R., Wattal, S.: An empirical analysis of the impact of software vulnerability announcements on firm stock price. IEEE Trans. Softw. Eng. 33(8), 544–557 (2007)
Article Google Scholar
Shah, S., Mehtre, B.M.: An overview of vulnerability assessment and penetration testing techniques. J. Comput. Virol. Hack. Tech. 11(1), 27–49 (2015)
Article Google Scholar
Bishop, M., Goldman, E.: The strategy and tactics of information warfare. Contemp. Secur. Policy 24(1), 113–139 (2003)
Article Google Scholar
Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Softw. Eng. 40(10), 993–1006 (2014)
Article Google Scholar
Ghaffarian, S.M., Shahriari, H.R.: Software vulnerability analysis and discovery using machine-learning and data-mining techniques: a survey. ACM Comput. Surv. (CSUR) 50(4), 1–36 (2017)
Article Google Scholar
Myasnikov, A.G., Rybalov, A.N.: Generic complexity of undecidable problems. J. Symb. Log. 73(2), 656–673 (2008)
Article MathSciNet MATH Google Scholar
Burkholder, L.: The halting problem. ACM SIGACT News 18(3), 48–60 (1987)
Article Google Scholar
Du, X., Chen, B., Li, Y., Guo, J., Zhou, Y., Liu, Y., Jiang, Y.: Leopard: Identifying vulnerable code for vulnerability assessment through program metrics. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 60–71. IEEE (2019)
Meneely, A., Corcoran, M., Williams, L.: Improving developer activity metrics with issue tracking annotations. In: Proceedings of the 2010 ICSE Workshop on Emerging Trends in Software Metrics, pp. 75–80 (2010)
Yamaguchi, F., Golde, N., Arp, D., Rieck, K.: Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE Symposium on Security and Privacy, pp. 590–604. IEEE (2014)
Votipka, D., Stevens, R., Redmiles, E., Hu, J., Mazurek, M.: Hackers vs. testers: a comparison of software vulnerability discovery processes. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 374-391. IEEE (2018)
Liu, B., Shi, L., Cai, Z., Li, M.: Software vulnerability discovery techniques: a survey. In: 2012 Fourth International Conference on Multimedia Information Networking and Security, pp. 152–156. IEEE (2012)
Damodaran, A., Troia, F.D., Visaggio, C.A., Austin, T.H., Stamp, M.: A comparison of static, dynamic, and hybrid analysis for malware detection. J. Comput. Virol. Hack. Tech. 13(1), 1–12 (2017)
Article Google Scholar
Annachhatre, C., Austin, T.H., Stamp, M.: Hidden Markov models for malware classification. J. Comput. Virol. Hack. Tech. 11(2), 59–73 (2015)
Article Google Scholar
Baysa, D., Low, R.M., Stamp, M.: Structural entropy and metamorphic malware. J. Comput. Virol. Hack. Tech. 9(4), 179–192 (2013)
Article Google Scholar
Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: 12th USENIX Security Symposium (USENIX Security 03) (2003)
Engler, D., Chen, D.Y., Hallem, S., Chou, A., Chelf, B.: Bugs as deviant behavior: a general approach to inferring errors in systems code. ACM SIGOPS Oper. Syst. Rev. 35(5), 57–72 (2001)
Article Google Scholar
Medeiros, I., Neves, N., Correia, M.: DEKANT: a static analysis tool that learns to detect web application vulnerabilities. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, pp. 1–11 (2016)
Yi, L., Junbin, H.: A dynamic detection method based on Web crawler and page code behavior for XSS vulnerability. Telecommun. Sci. 32(3), 87 (2016)
Google Scholar
Shuai, B., Li, M., Li, H., Zhang, Q., Tang, C.: Software vulnerability detection using genetic algorithm and dynamic taint analysis. In: 2013 3rd International Conference on Consumer Electronics, Communications and Networks, pp. 589–593. IEEE (2013)
Li, X., Wang, L., Xin, Y., Yang, Y., Tang, Q., Chen, Y.: Automated software vulnerability detection based on hybrid neural network. Appl. Sci. 11(7), 3201 (2021)
Article Google Scholar
Shar, L.K., Tan, H.B.K., Briand, L.C.: Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 642–651. IEEE (2013)
Muiruri, C.K.: A hybrid algorithm for detecting web-based applications vulnerabilities (Doctoral dissertation, University of Nairobi) (2015)
Chowdhury, I., Zulkernine, M.: Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. J. Syst. Archit. 57(3), 294–313 (2011)
Article Google Scholar
Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S., Deng, Z., Zhong, Y.: Vuldeepecker: a deep learning-based system for vulnerability detection. In: Proceedings of the NDSS (2018)
Zou, D., Wang, S., Xu, S., Li, Z., Jin, H.: \(\mu \)VulDeePecker: a deep learning-based system for multiclass vulnerability detection. IEEE Trans. Dependable Secur. Comput. 18(5), 2224–2236 (2019)
Google Scholar
Lin, G., Zhang, J., Luo, W., Pan, L., De Vel, O., Montague, P., Xiang, Y.: Software vulnerability discovery via learning multi-domain knowledge bases. IEEE Trans. Dependable Secur. Comput. 18(5), 2469–2485 (2019)
Article Google Scholar
Pradel, M., Sen, K.: Deepbugs: a learning approach to name-based bug detection. In: Proceedings of the ACM on Programming Languages, 2(OOPSLA), pp. 1–25 (2018)
Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Wang, H., Ye, G., Tang, Z., Tan, S.H., Huang, S., Fang, D., Feng, Y., Bian, L., Wang, Z.: Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans. Inf. Forensics Secur. 16, 1943–1958 (2020)
Article Google Scholar
Austin, A., Williams, L.: One technique is not enough: a comparison of vulnerability discovery techniques. In: 2011 International Symposium on Empirical Software Engineering and Measurement, pp. 97–106. IEEE (2011)
Vanegue, J., Lahiri, S.K.: Towards practical reactive security audit using extended static checkers. In: 2013 IEEE Symposium on Security and Privacy, pp. 33–47. IEEE (2013)
Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 359–368 (2012)
Yamaguchi, F., Wressnegger, C., Gascon, H., Rieck, K.: Chucky: exposing missing checks in source code for vulnerability discovery. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 499–510 (2013)
Son, S., McKinley, K.S., Shmatikov, V.: Rolecast: finding missing security checks when you do not know what checks are. In: Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, pp. 1069–1084 (2011)
Tan, L., Zhang, X., Ma, X., Xiong, W., Zhou, Y.: AutoISES: automatically inferring security specification and detecting violations. In: USENIX Security Symposium, pp. 379–394 (2008)
Yamaguchi, F., Maier, A., Gascon, H., Rieck, K.: Automatic inference of search patterns for taint-style vulnerabilities. In: 2015 IEEE Symposium on Security and Privacy, pp. 797–812. IEEE (2015)
Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38(6), 1276–1304 (2011)
Article Google Scholar
Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Appl. Soft Comput. 27, 504–518 (2015)
Article Google Scholar
Radjenović, D., Heričko, M., Torkar, R., Živkovič, A.: Software fault prediction metrics: a systematic literature review. Inf. Softw. Technol. 55(8), 1397–1418 (2013)
Article Google Scholar
Evans, D., Larochelle, D.: Improving security using extensible lightweight static analysis. IEEE Softw. 19(1), 42–51 (2002)
Article Google Scholar
Yamashita, K., Huang, C., Nagappan, M., Kamei, Y., Mockus, A., Hassan, A. E., Ubayashi, N.: Thresholds for size and complexity metrics: a case study from the perspective of defect density. In: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 191–201. IEEE (2016)
Qin, F., Zheng, Z., Qiao, Y., Trivedi, K.S.: Studying aging-related bug prediction using cross-project models. IEEE Trans. Reliab. 68(3), 1134–1153 (2018)
Article Google Scholar
Dahse, J., Holz, T.: Simulation of built-in PHP features for precise static code analysis. In: NDSS, vol. 14, pp. 23–26 (2014)
Shin, Y., Williams, L.: An empirical model to predict security vulnerabilities using code complexity metrics. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 315–317 (2008)
Shin, Y., Williams, L.: An initial study on the use of execution complexity metrics as indicators of software vulnerabilities. In: Proceedings of the 7th International Workshop on Software Engineering for Secure Systems, pp. 1–7 (2011)
Gegick, M., Williams, L., Osborne, J., Vouk, M.: Prioritizing software security fortification throughcode-level metrics. In: Proceedings of the 4th ACM Workshop on Quality of Protection, pp. 31–38 (2008)
Neuhaus, S., Zimmermann, T., Holler, C., Zeller, A.: Predicting vulnerable software components. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 529–540 (2007)
Nguyen, V. H., Tran, L.M.S.: Predicting vulnerable software components with dependency graphs. In: Proceedings of the 6th International Workshop on Security Measurements and Metrics, pp. 1–8 (2010)
Shin, Y., Williams, L.: Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18(1), 25–59 (2013)
Article Google Scholar
Morrison, P., Herzig, K., Murphy, B., Williams, L.: Challenges with applying vulnerability prediction models. In: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, pp. 1–9 (2015)
Hovsepyan, A., Scandariato, R., Joosen, W.: Is newer always better? The case of vulnerability prediction models. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 1–6 (2016)
Sarıman, G., Kucuksille, E.U.: A novel approach to determine software security level using bayes classifier via static code metrics. Elektron. Elektrotech. 22(2), 73–80 (2016)
Article Google Scholar
Camilo, F., Meneely, A., Nagappan, M.: Do bugs foreshadow vulnerabilities? A study of the chromium project. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 269–279. IEEE (2015)
Shar, L.K., Tan, H.B.K.: Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns. Inf. Softw. Technol. 55(10), 1767–1780 (2013)
Article Google Scholar
Grieco, G., Grinblat, G.L., Uzal, L., Rawat, S., Feist, J., Mounier, L.: Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 85–96 (2016)
Kim, S., Woo, S., Lee, H., Oh, H.: Vuddy: a scalable approach for vulnerable code clone discovery. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 595–614. IEEE (2017)
Nembhard, F., Carvalho, M., Eskridge, T.: Extracting knowledge from open source projects to improve program security. In: SoutheastCon 2018, pp. 1–7. IEEE (2018)
Gupta, A., Suri, B., Kumar, V., Jain, P.: Extracting rules for vulnerabilities detection with static metrics using machine learning. Int. J. Syst. Assur. Eng. Manag. 12(1), 65–76 (2021)
Article Google Scholar
Li, Z., Zou, D., Xu, S., Chen, Z., Zhu, Y., Jin, H.: Vuldeelocator: a deep learning-based fine-grained vulnerability detector. IEEE Trans. Dependable Secur. Comput. 19, 2821–2837 (2021)
Article Google Scholar
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers, Principles, Techniques, vol. 7, p. 9. Addison Wesley, Reading (1986)
MATH Google Scholar
Moonen, L.: Generating robust parsers using island grammars. In: Proceedings Eighth Working Conference on Reverse Engineering, pp. 13–22. IEEE (2001)
Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. (TOPLAS) 9(3), 319–349 (1987)
Article MATH Google Scholar
Yamaguchi, F.: Pattern-based vulnerability discovery (Doctoral Dissertation, Niedersächsische Staats-und Universitätsbibliothek Göttingen) (2015)
https://joern.io/
Madsen, M., Livshits, B., Fanning, M.: Practical static analysis of JavaScript applications in the presence of frameworks and libraries. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pp. 499–509 (2013)
Zitser, M., Lippmann, R., Leek, T.: Testing static analysis tools using exploitable buffer overflows from open source code. In: Proceedings of the 12th ACM SIGSOFT Twelfth International Symposium on Foundations of Software Engineering, pp. 97–106 (2004)
McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 4, 308–320 (1976)
Article MathSciNet MATH Google Scholar
Szekeres, L., Payer, M., Wei, T., Song, D.: Sok: eternal war in memory. In: 2013 IEEE Symposium on Security and Privacy, pp. 48–62. IEEE (2013)
Manikandan, G., Abirami, S.: Feature selection is important: state-of-the-art methods and application domains of feature selection on high-dimensional data. In: Applications in Ubiquitous Computing, pp. 177–196 (2021)
Biswas, P., Di Federico, A., Carr, S. A., Rajasekaran, P., Volckaert, S., Na, Y., Payer, M.: Venerable variadic vulnerabilities vanquished. In: 26th USENIX Security Symposium (USENIX Security 17), pp. 186–198 (2017)

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Engineering, Shahid Beheshti University (SBU), Tehran, Iran
Puya Pakshad, Alireza Shameli-Sendi & Behzad Khalaji Emamzadeh Abbasi

Authors

Puya Pakshad
View author publications
You can also search for this author in PubMed Google Scholar
Alireza Shameli-Sendi
View author publications
You can also search for this author in PubMed Google Scholar
Behzad Khalaji Emamzadeh Abbasi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alireza Shameli-Sendi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Table 9 shows the information about the hyperparameters of all the machine learning techniques used in this paper. It should be noted that the hyperparameters were obtained using the Grid Search method.

Table 9 The hyperparameters of all the machine learning techniques used in this paper

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pakshad, P., Shameli-Sendi, A. & Khalaji Emamzadeh Abbasi, B. A security vulnerability predictor based on source code metrics. J Comput Virol Hack Tech 19, 615–633 (2023). https://doi.org/10.1007/s11416-023-00469-y

Download citation

Received: 04 October 2022
Accepted: 31 January 2023
Published: 17 February 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11416-023-00469-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A security vulnerability predictor based on source code metrics

Abstract

Access this article

Similar content being viewed by others

An overview of vulnerability assessment and penetration testing techniques

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Malware Detection Using Machine Learning and Deep Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A security vulnerability predictor based on source code metrics

Abstract

Access this article

Similar content being viewed by others

An overview of vulnerability assessment and penetration testing techniques

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Malware Detection Using Machine Learning and Deep Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation