Abstract
It is essential to identify potentially vulnerable code in our software systems. Deep neural network techniques have been used for vulnerability detection. However, existing methods usually ignore the feature representation of vulnerable datasets, resulting in unsatisfactory model performance. Such vulnerability detection techniques should achieve high accuracy, relatively high true-positive rate, and low false-negative rate. At the same time, it needs to be able to complete the vulnerability detection of actual projects and does not require additional expert knowledge or tedious configuration. In this article, we propose and implement VDDRL (A Vulnerability Detection Method Based On Deep Representation Learning). This deep representation learning-based vulnerability detection method combines feature extraction and ensemble learning. VDDRL uses the word2vec model to convert the source code into a vector representation. Deep representations of vulnerable code are learned from vulnerable code token sequences using LSTM models and then trained for classification using traditional machine learning algorithms. The training dataset we use is derived from actual projects and contains seven different types of vulnerabilities. Through comparative experiments on datasets, VDDRL achieves an Accuracy of 95.6%–98.7%, a Precision of 91.6%–99.0%, a Recall of 84.7%–99.5%, and an F1 of 88.1%–99.2%. Both perform better than the baseline method. Our experimental results show that VDDRL is a generic, lightweight, and extensible vulnerability detection method. Compared with other methods, it has better performance and robustness.
Z. Wang and S. Meng—These authors contributed to the work equally and should be regarded as co-first authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Liu, Z., Yuan, Y., Wang, S., et al.: SoK: demystifying binary lifters through the lens of downstream applications. In: 2022 IEEE Symposium on Security and Privacy (SP), Los Alamitos, CA, USA. IEEE Computer Society, pp. 453–472 (2022)
Hazimeh, A., Herrera, A., Payer, M.: Magma: a ground-truth fuzzing benchmark. Proc. ACM Meas. Anal. Comput. Syst. 4(3), 1–29 (2020)
Ferrara, P., Mandal, A.K., Cortesi, A., et al.: Static analysis for discovering IoT vulnerabilities. Int. J. Softw. Tools Technol. Trans. 23(1), 71–88 (2021)
Pecorelli, F., Lujan, S., Lenarduzzi, V., et al.: On the adequacy of static analysis warnings with respect to code smell prediction. Empir. Softw. Eng. 27(3), 1–44 (2022)
Palit, T., Moon, J.F., Monrose, F., et al.: Dynpta: combining static and dynamic analysis for practical selective data protection. In: 2021 IEEE Symposium on Security and Privacy (SP). IEEE, pp. 1919–1937 (2021)
Wang, C.Y., You, C.Y., Hsu, F.H., et al.: SMS observer: a dynamic mechanism to analyze the behavior of SMS-based malware. J. Parallel Distrib. Comput. 156, 25–37 (2021)
Li, Y., Chen, B., Chandramohan, M., Lin, S.-W., Liu, Y., Tiu, A.: Steelix: program-state based binary fuzzing. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, pp. 627–637 (2017)
Morrison, P., Herzig, K., Murphy, B., Williams, L.: Challenges with applying vulnerability prediction models. In: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, pp. 4 (2015)
Hovsepyan, A., Scandariato, R., Joosen, W., Walden, J.: Software vulnerability prediction using text analysis techniques. In: Proceedings of 4th International Workshop Security Measurements and Metrics, MetriSec 2012, pp. 7–10 (2012)
Mou, L., Li, G., Jin, Z., Zhang, L., Wang, T.: TBCNN: a tree-based convolutional neural network for programming language processing. CoRR (2014)
Li, Z., et al.: VulDeePecker: a deep learning-based system for vulnerability detection. CoRR abs/1801.01681 (2018)
Wartschinski, L., et al.: VUDENC: vulnerability detection with deep learning on a natural codebase for Python. Inf. Softw. Technol. 106809 (2022)
Zhang, Y., Gao, X., Duck, G.J., et al.: Program vulnerability repair via inductive inference (2022)
Siow, J.K., Liu, S., Xie, X., et al.: Learning program semantics with code representations: an empirical study. arXiv preprint arXiv:2203.11790 (2022)
Schrammel, D., Weiser, S., Sadek, R., et al.: Jenny: securing syscalls for PKU-based memory isolation systems. In: USENIX Security Symposium (2022)
He, L., Hu, H., Su, P., et al.: FreeWill: automatically diagnosing use-after-free bugs via reference miscounting detection on binaries. In: 31st USENIX Security Symposium (USENIX Security 2022), pp. 2497–2512 (2022)
Liu, K., Kim, D., Bissyand'e, T.F., Yoo, S., Le Traon, Y.: Mining fix patterns for findbugs violations. IEEE Trans. Softw. Eng. 47, 165–188 (2018)
J. Harer, O., et al.: Learning to repair software vulnerabilities with generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 7933–7943 (2018)
Gupta, R., Pal, S., Kanade, A., Shevade, S.: Deepfix: fixing common c language errors by deep learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
SQL Injection (2020). https://owasp.org/www-community/attacks/SQL_Injection. Accessed 14 May 2022
Cross-site Scripting (2020). https://owasp.org/www-community/attacks/xss/. Accessed 14 May 2022
Command Injection (2020). https://owasp.org/www-community/attacks/Command_Injection. Accessed 14 May 2022
Cross Site Request Forgery (2020). https://owasp.org/www-community/attacks/csrf. Accessed 14 May 2022
Code Injection (2020). https://owasp.org/www-community/attacks/Code_Injection. Accessed 14 May 2022
CWE-601: Open Redirect (2020). https://cwe.mitre.org/data/definitions/601. Accessed 14 May 2022
Shin, Y., Williams, L.: An empirical model to predict security vulnerabilities using code complexity metrics. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 315–317 (2008)
Chowdhury, I., Zulkernine, M.: Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. J. Syst. Arch. 57(3), 294–313 (2011)
Dam, H.K., Tran, T., Pham, T., Ng, S.W., Grundy, J., Ghose, A.: Automatic feature learning for vulnerability prediction. arXiv preprint. arXiv:1708.02368 (2017)
Acknowledgements
This work was supported in part by the National Key Research and Development Program of China under Grant 2018YFB0803401,in part by the China Postdoctoral Science Foundation funded project (2019M650606), and in part by the First-class Discipline Construction Project of Beijing Electronic Science and Technology Institute (3201012).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, Z., Meng, S., Chen, Y. (2023). Vulnerability Detection with Representation Learning. In: Wang, G., Choo, KK.R., Wu, J., Damiani, E. (eds) Ubiquitous Security. UbiSec 2022. Communications in Computer and Information Science, vol 1768. Springer, Singapore. https://doi.org/10.1007/978-981-99-0272-9_8
Download citation
DOI: https://doi.org/10.1007/978-981-99-0272-9_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0271-2
Online ISBN: 978-981-99-0272-9
eBook Packages: Computer ScienceComputer Science (R0)