research-article

Open access

VulDL: Tree-based and Graph-based Neural Networks for Vulnerability Detection and Localization

Authors:

Mutian YangAuthors Info & Claims

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

Pages 323 - 332

https://doi.org/10.1145/3661167.3661211

Published: 18 June 2024 Publication History

All formats PDF

Abstract

With the dramatic increase in the number and size of software in the industry, tremendous research has been studied to automatically detect vulnerabilities. However, existing detection methods have limitations in code semantic modeling and detection granularity, which makes them unable to meet the requirements of high accuracy and fine granularity at the same time. In this paper, we propose a general framework, namely VulDL, which can effectively identify whether a given code snippet has a vulnerability and locate the specific code line where the vulnerability resides. VulDL first represents the source code as a novel semantic data structure, namely the adapted code property graph. After that, tree-based and graph-based neural networks are designed, which learn features according to the hierarchies and neighborhoods, and further realize vulnerability identification and localization. Our evaluation shows that VulDL achieves F1-scores of 98.68% and 94.85% in the identification of buffer error and resource management error vulnerabilities and 97.73% on their combined vulnerabilities. On the FFmpeg+QEMU dataset, VulDL achieves an F1-score of 59.62%, which is more effective than existing methods. Besides, VulDL can locate vulnerabilities at the statement granularity with F1-scores of 97.88%, 98.31%, and 99.16% on the evaluated datasets.

References

[1]

Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2020. Deep Learning based Vulnerability Detection: Are We There Yet?. In ICSE.

[2]

Hao Chen and David A. Wagner. 2002. MOPS: an infrastructure for examining security properties of software. In CCS.

[3]

Xu Duan, Jingzheng Wu, Mengnan Du, Tianyue Luo, Mutian Yang, and Yanjun Wu. 2021. MultiCode: A Unified Code Analysis Framework based on Multi-type and Multi-granularity Semantic Learning. In ISSRE.

[4]

Xu Duan, Jingzheng Wu, Shouling Ji, Zhiqing Rui, Tianyue Luo, Mutian Yang, and Yanjun Wu. 2019. VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities. In IJCAI.

[5]

Luis Gutiérrez and Brian Keith. 2018. A systematic literature review on word embeddings. In International Conference on Software Process Improvement.

[6]

Vincent J. Hellendoorn and Premkumar T. Devanbu. 2017. Are deep neural networks the best choice for modeling source code?. In ESEC/FSE.

[7]

Secure Software Inc. 2001. Rough Audit Tool for Security. https://code.google.com/archive/p/ rough-auditing-tool-for-security/.

[8]

Jiyong Jang, Abeer Agrawal, and David Brumley. 2012. ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions. In S&P.

[9]

Qilu Jiao and Shunyao Zhang. 2021. A brief survey of word embedding and its recent development. In IAEAC.

[10]

Junae Kim, David Hubczenko, and Paul Montague. 2019. Towards attention based vulnerability discovery using source code representation. In ICANN.

[11]

Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. In S&P.

[12]

Jingyue Li and Michael D. Ernst. 2012. CBCD: Cloned buggy code detector. In ICSE.

[13]

Yi Li, Shaohua Wang, Tien N. Nguyen, and Son Van Nguyen. 2019. Improving bug detection via context-based code representation learning and attention-based neural networks. PACMPL (2019).

[14]

Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, Zhaoxuan Chen, Sujuan Wang, and Jialai Wang. 2018. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. TDSC (2018).

[15]

Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In NDSS.

[16]

Guanjun Lin, Jun Zhang, Wei Luo, Lei Pan, and Yang Xiang. 2017. POSTER: Vulnerability Discovery with Function Representation Learning from Unlabeled Projects. In CCS 2017.

[17]

Xiang Ling, Lingfei Wu, Wei Deng, Zhenqing Qu, Jiangyu Zhang, Sheng Zhang, Tengfei Ma, Bin Wang, Chunming Wu, and Shouling Ji. 2022. MalGraph: Hierarchical Graph Neural Networks for Robust Windows Malware Detection. In INFOCOM.

[18]

Xiang Ling, Lingfei Wu, Saizhuo Wang, Tengfei Ma, Fangli Xu, Alex X Liu, Chunming Wu, and Shouling Ji. 2023. Multilevel Graph Matching Networks for Deep Graph Similarity Learning. TNNLS (2023).

[19]

Xiang Ling, Lingfei Wu, Saizhuo Wang, Gaoning Pan, Tengfei Ma, Fangli Xu, Alex X Liu, Chunming Wu, and Shouling Ji. 2021. Deep Graph Matching and Searching for Semantic Code Retrieval. TKDD (2021).

[20]

Brian W Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure (1975).

[21]

Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In AAAI.

[22]

NIST. [n. d.]. NATIONAL VULNERABILITY DATABASE. https://nvd.nist.gov/. Accessed 29 January 2023.

[23]

NIST. [n. d.]. NIST Softward Assurance Reference Dataset. https://samate.nist.gov/SRD/index.php

[24]

Yulei Pang, Xiaozhen Xue, and Akbar Siami Namin. 2015. Predicting Vulnerable Software Components through N-Gram Analysis and Statistical Feature Selection. In ICMLA.

[25]

Hao Peng, Lili Mou, Ge Li, Yuxuan Liu, Lu Zhang, and Zhi Jin. 2015. Building Program Vector Representations for Deep Learning. In KSEM.

[26]

Nam H. Pham, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2010. Detection of recurring software vulnerabilities. In ASE.

[27]

Rebecca L. Russell, Louis Y. Kim, Lei H. Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul M. Ellingwood, and Marc W. McConley. 2018. Automated vulnerability detection in source code using deep representation learning. In ICMLA.

[28]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR.

[29]

David A. Wheeler. 2001. Flawfinder. http://www.dwheeler.com/flawfinder.

[30]

Fabian Yamaguchi. [n. d.]. Joern. https://joern.readthedocs.io

[31]

Fabian Yamaguchi. 2015. Pattern-Based Vulnerability Discovery. Ph. D. Dissertation. University of Göttingen.

[32]

Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and Discovering Vulnerabilities with Code Property Graphs. In S&P 2014.

Digital Library

[33]

Fabian Yamaguchi, Christian Wressnegger, Hugo Gascon, and Konrad Rieck. 2013. Chucky: exposing missing checks in source code for vulnerability discovery. In CCS.

[34]

Hao Yu, Wing Lam, Long Chen, Ge Li, Tao Xie, and Qianxiang Wang. 2019. Neural detection of semantic code clones via tree-based convolution. In ICPC.

[35]

Yaqin Zhou, Shangqing Liu, Jing Kai Siow, Xiaoning Du, and Yang Liu. [n. d.]. FFmpeg+QEMU dataset. https://sites.google.com/view/devign

[36]

Yaqin Zhou, Shangqing Liu, Jing Kai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. In NeurIPS.

Index Terms

VulDL: Tree-based and Graph-based Neural Networks for Vulnerability Detection and Localization
1. Security and privacy
  1. Systems security
    1. Vulnerability management
      1. Vulnerability scanners

Recommendations

Code vulnerability detection method based on taint analysis and graph neural network
IoTML '24: Proceedings of the 2024 4th International Conference on Internet of Things and Machine Learning

With the increasing number of software security problems, the increasing types and numbers of vulnerabilities put forward higher requirements for code vulnerability detection methods. Traditional vulnerability detection methods mainly rely on predefined ...
Vulnerability detection based on federated learning
Abstract Context:
Detecting potential vulnerabilities is a key step in defending against network attacks. However, manual detection is time-consuming and requires expertise. Therefore, vulnerability detection must require automated techniques.
Objective: ...
Commit-Level, Neural Vulnerability Detection and Assessment
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Software Vulnerabilities (SVs) are security flaws that are exploitable in cyber-attacks. Delay in the detection and assessment of SVs might cause serious consequences due to the unknown impacts on the attacked systems. The state-of-the-art approaches ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

June 2024

728 pages

ISBN:9798400717017

DOI:10.1145/3661167

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

YuanTu Large Research Infrastructure.
National Natural Science Foundation of China
the Strategic Priority Research Program of the Chinese Academy of Sciences

Conference

EASE 2024

EASE 2024: 28th International Conference on Evaluation and Assessment in Software Engineering

June 18 - 21, 2024

Salerno, Italy

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
361
Total Downloads

Downloads (Last 12 months)361
Downloads (Last 6 weeks)47

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten