Abstract
As the size and complexity of software continue to increase, detecting software vulnerabilities becomes increasingly challenging. Traditional static and dynamic analysis methods often suffer from poor accuracy or little reliance on expert knowledge. In recent years, deep learning has emerged as a promising direction in this field due to its ability to automatically learn the subtle features in the massive software data. However, existing deep learning-based vulnerability detection methods have the following limitations: 1) They struggle with processing long source code sequences effectively, leading to sub-optimal feature representation. 2) Although they can find similar vulnerability features across different vulnerable programs, they often fail to explicitly leverage these vulnerability features, resulting in slightly inferior performance of model detection. In this paper, we propose a vulnerability detection method called DV-LVF, based on explicitly leveraging vulnerability features on program slices, to effectively detect vulnerabilities at both function and statement levels. Specifically, we introduce Gated Recurrent Unit (GRU) statement embedding technique combined with program slicing to enhance program feature representation. We introduce a vulnerability dictionary (vulDict), which can explicitly summarize and exploit these vulnerability features, to improve the performance of model detection. Our evaluation on real-world software shows that DV-LVF outperforms the state-of-the-art in both function-level and statement-level detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, Z., et al.: Vulpecker: an automated vulnerability detection system based on code similarity analysis. In: Proceedings of the 32nd Annual Conference on Computer Security Applications (2016)
Kim, S., et al.: Vuddy: a scalable approach for vulnerable code clone discovery. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE (2017)
Wheeler, D.A.: Flawfinder (2016). https://www.dwheeler.com/flawfinder/. Accessed 20 May 2018
Sutton, M., Greene, A., Amini, P.: Fuzzing: Brute Force Vulnerability Discovery. Pearson Education, London (2007)
Newsome, J., Song, D.X.: Dynamic taint analysis for automatic detection, analysis, and signaturegeneration of exploits on commodity software. In: NDSS, vol. 5 (2005)
Zaazaa, O., El Bakkali, H.: Dynamic vulnerability detection approaches and tools: state of the art. In: 2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS). IEEE (2020)
Li, Z., et al.: Vuldeepecker: a deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681 (2018)
Zou, D., et al. : \(\mu \) VulDeePecker: a deep learning-based system for multiclass vulnerability detection. IEEE Trans. Dependable Secure Comput. 18(5), 2224–2236 (2019)
Russell, R., et al.: Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE (2018)
Li, Z., et al.: VR: a deep learning-based fine-grained vulnerability detector. IEEE Trans. Dependable Secure Comput. 19(4), 2821–2837 (2021)
Wang, H., et al.: Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans. Inf. Forensics Secur. 16, 1943–1958 (2020)
Zhuang, Y., et al.: Software vulnerability detection via deep learning over disaggregated code graph representation. arXiv preprint arXiv:2109.03341 (2021)
Thapa, C., et al.: Transformer-based language models for software vulnerability detection. In: Proceedings of the 38th Annual Computer Security Applications Conference (2022)
Purba, M.D., et al.: Software vulnerability detection using large language models. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE (2023)
Zhang, X., et al.: VulD-Transformer: source code vulnerability detection via transformer. In: Proceedings of the 14th Asia-Pacific Symposium on Internetware (2023)
Chakraborty, S., et al.: Deep learning based vulnerability detection: are we there yet? In: ACM/IEEE International Conference on Software Engineering (2022)
Zeng, P., et al.: Software vulnerability analysis and discovery using deep learning techniques: a survey. IEEE Access 8, 197158–197172 (2020)
Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)
Fan, J., et al.: AC/C++ code vulnerability dataset with code changes and CVE summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories (2020)
Weiser, M.: Program slicing. IEEE Trans. Softw. Eng. 4, 352–357 (1984)
Li, Z., et al.: SySeVR: a framework for using deep learning to detect software vulnerabilities. IEEE Trans. Dependable Secure Comput. 19(04), 2244–2258 (2022)
Joern[EB/OL]. https://github.com/octopus-platform/joern/
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
Shi, X., et al.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 28 (2015)
Wang, Y., et al.: Codet5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021)
Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)
Lu, S., et al.: Codexglue: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021)
Fu, M., Tantithamthavorn, C.: Linevul: a transformer-based line-level vulnerability prediction. In: Proceedings of the 19th International Conference on Mining Software Repositories (2022)
Ding, Y., et al.: VELVET: a novel ensemble learning approach to automatically locate vulnerable statements. In: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE (2022)
Nguyen, V.-A., et al.: ReGVD: revisiting graph neural networks for vulnerability detection. In: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings (2022)
Zhou, Y., et al.: Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv. Neural Inf. Process. Syst. 32 (2019)
Zou, D., et al.: mVulPreter: a multi-granularity vulnerability detection system with interpretations. IEEE Trans. Dependable Secure Comput. (2022)
Nguyen, V., et al.: Information-theoretic source code vulnerability highlighting. In: 2021 International Joint Conference on Neural Networks (IJCNN). IEEE (2021)
Zhang, J., et al.: Learning to locate and describe vulnerabilities. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE (2023)
Hin, D., et al.: LineVD: statement-level vulnerability detection using graph neural networks. In: Proceedings of the 19th International Conference on Mining Software Repositories (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Guo, H., Zhang, X., Zhang, Z., Shen, Y. (2024). Detecting Vulnerabilities via Explicitly Leveraging Vulnerability Features on Program Slices. In: Chin, WN., Xu, Z. (eds) Theoretical Aspects of Software Engineering. TASE 2024. Lecture Notes in Computer Science, vol 14777. Springer, Cham. https://doi.org/10.1007/978-3-031-64626-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-64626-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64625-6
Online ISBN: 978-3-031-64626-3
eBook Packages: Computer ScienceComputer Science (R0)