Vulnerability Detection with Representation Learning

Wang, Zhiqiang; Meng, Sulong; Chen, Ying

doi:10.1007/978-981-99-0272-9_8

Zhiqiang Wang^9,10,
Sulong Meng⁹ &
Ying Chen⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1768))

Included in the following conference series:

International Conference on Ubiquitous Security

922 Accesses

Abstract

It is essential to identify potentially vulnerable code in our software systems. Deep neural network techniques have been used for vulnerability detection. However, existing methods usually ignore the feature representation of vulnerable datasets, resulting in unsatisfactory model performance. Such vulnerability detection techniques should achieve high accuracy, relatively high true-positive rate, and low false-negative rate. At the same time, it needs to be able to complete the vulnerability detection of actual projects and does not require additional expert knowledge or tedious configuration. In this article, we propose and implement VDDRL (A Vulnerability Detection Method Based On Deep Representation Learning). This deep representation learning-based vulnerability detection method combines feature extraction and ensemble learning. VDDRL uses the word2vec model to convert the source code into a vector representation. Deep representations of vulnerable code are learned from vulnerable code token sequences using LSTM models and then trained for classification using traditional machine learning algorithms. The training dataset we use is derived from actual projects and contains seven different types of vulnerabilities. Through comparative experiments on datasets, VDDRL achieves an Accuracy of 95.6%–98.7%, a Precision of 91.6%–99.0%, a Recall of 84.7%–99.5%, and an F1 of 88.1%–99.2%. Both perform better than the baseline method. Our experimental results show that VDDRL is a generic, lightweight, and extensible vulnerability detection method. Compared with other methods, it has better performance and robustness.

Z. Wang and S. Meng—These authors contributed to the work equally and should be regarded as co-first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The application of neural network for software vulnerability detection: a review

Article 27 November 2022

VDCNet: A Vulnerability Detection and Classification System in Cross-Project Scenarios

Deep-Learning-Based Vulnerability Detection in Binary Executables

References

Liu, Z., Yuan, Y., Wang, S., et al.: SoK: demystifying binary lifters through the lens of downstream applications. In: 2022 IEEE Symposium on Security and Privacy (SP), Los Alamitos, CA, USA. IEEE Computer Society, pp. 453–472 (2022)
Google Scholar
Hazimeh, A., Herrera, A., Payer, M.: Magma: a ground-truth fuzzing benchmark. Proc. ACM Meas. Anal. Comput. Syst. 4(3), 1–29 (2020)
Article Google Scholar
Ferrara, P., Mandal, A.K., Cortesi, A., et al.: Static analysis for discovering IoT vulnerabilities. Int. J. Softw. Tools Technol. Trans. 23(1), 71–88 (2021)
Article Google Scholar
Pecorelli, F., Lujan, S., Lenarduzzi, V., et al.: On the adequacy of static analysis warnings with respect to code smell prediction. Empir. Softw. Eng. 27(3), 1–44 (2022)
Article Google Scholar
Palit, T., Moon, J.F., Monrose, F., et al.: Dynpta: combining static and dynamic analysis for practical selective data protection. In: 2021 IEEE Symposium on Security and Privacy (SP). IEEE, pp. 1919–1937 (2021)
Google Scholar
Wang, C.Y., You, C.Y., Hsu, F.H., et al.: SMS observer: a dynamic mechanism to analyze the behavior of SMS-based malware. J. Parallel Distrib. Comput. 156, 25–37 (2021)
Article Google Scholar
Li, Y., Chen, B., Chandramohan, M., Lin, S.-W., Liu, Y., Tiu, A.: Steelix: program-state based binary fuzzing. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, pp. 627–637 (2017)
Google Scholar
Morrison, P., Herzig, K., Murphy, B., Williams, L.: Challenges with applying vulnerability prediction models. In: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, pp. 4 (2015)
Google Scholar
Hovsepyan, A., Scandariato, R., Joosen, W., Walden, J.: Software vulnerability prediction using text analysis techniques. In: Proceedings of 4th International Workshop Security Measurements and Metrics, MetriSec 2012, pp. 7–10 (2012)
Google Scholar
Mou, L., Li, G., Jin, Z., Zhang, L., Wang, T.: TBCNN: a tree-based convolutional neural network for programming language processing. CoRR (2014)
Google Scholar
Li, Z., et al.: VulDeePecker: a deep learning-based system for vulnerability detection. CoRR abs/1801.01681 (2018)
Google Scholar
Wartschinski, L., et al.: VUDENC: vulnerability detection with deep learning on a natural codebase for Python. Inf. Softw. Technol. 106809 (2022)
Google Scholar
Zhang, Y., Gao, X., Duck, G.J., et al.: Program vulnerability repair via inductive inference (2022)
Google Scholar
Siow, J.K., Liu, S., Xie, X., et al.: Learning program semantics with code representations: an empirical study. arXiv preprint arXiv:2203.11790 (2022)
Schrammel, D., Weiser, S., Sadek, R., et al.: Jenny: securing syscalls for PKU-based memory isolation systems. In: USENIX Security Symposium (2022)
Google Scholar
He, L., Hu, H., Su, P., et al.: FreeWill: automatically diagnosing use-after-free bugs via reference miscounting detection on binaries. In: 31st USENIX Security Symposium (USENIX Security 2022), pp. 2497–2512 (2022)
Google Scholar
Liu, K., Kim, D., Bissyand'e, T.F., Yoo, S., Le Traon, Y.: Mining fix patterns for findbugs violations. IEEE Trans. Softw. Eng. 47, 165–188 (2018)
Google Scholar
J. Harer, O., et al.: Learning to repair software vulnerabilities with generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 7933–7943 (2018)
Google Scholar
Gupta, R., Pal, S., Kanade, A., Shevade, S.: Deepfix: fixing common c language errors by deep learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
SQL Injection (2020). https://owasp.org/www-community/attacks/SQL_Injection. Accessed 14 May 2022
Cross-site Scripting (2020). https://owasp.org/www-community/attacks/xss/. Accessed 14 May 2022
Command Injection (2020). https://owasp.org/www-community/attacks/Command_Injection. Accessed 14 May 2022
Cross Site Request Forgery (2020). https://owasp.org/www-community/attacks/csrf. Accessed 14 May 2022
Code Injection (2020). https://owasp.org/www-community/attacks/Code_Injection. Accessed 14 May 2022
CWE-601: Open Redirect (2020). https://cwe.mitre.org/data/definitions/601. Accessed 14 May 2022
Shin, Y., Williams, L.: An empirical model to predict security vulnerabilities using code complexity metrics. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 315–317 (2008)
Google Scholar
Chowdhury, I., Zulkernine, M.: Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. J. Syst. Arch. 57(3), 294–313 (2011)
Google Scholar
Dam, H.K., Tran, T., Pham, T., Ng, S.W., Grundy, J., Ghose, A.: Automatic feature learning for vulnerability prediction. arXiv preprint. arXiv:1708.02368 (2017)

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China under Grant 2018YFB0803401,in part by the China Postdoctoral Science Foundation funded project (2019M650606), and in part by the First-class Discipline Construction Project of Beijing Electronic Science and Technology Institute (3201012).

Author information

Authors and Affiliations

Beijing Electronic Science and Technology Institute, Beijing, 100071, China
Zhiqiang Wang, Sulong Meng & Ying Chen
State Information Center, Beijing, 100045, China
Zhiqiang Wang

Authors

Zhiqiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Sulong Meng
View author publications
You can also search for this author in PubMed Google Scholar
Ying Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Chen .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Guojun Wang
University of Texas at San Antonio, San Antonio, TX, USA
Kim-Kwang Raymond Choo
Temple University, Philadelphia, PA, USA
Jie Wu
Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
Ernesto Damiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Meng, S., Chen, Y. (2023). Vulnerability Detection with Representation Learning. In: Wang, G., Choo, KK.R., Wu, J., Damiani, E. (eds) Ubiquitous Security. UbiSec 2022. Communications in Computer and Information Science, vol 1768. Springer, Singapore. https://doi.org/10.1007/978-981-99-0272-9_8

Download citation

DOI: https://doi.org/10.1007/978-981-99-0272-9_8
Published: 16 February 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0271-2
Online ISBN: 978-981-99-0272-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics