skip to main content
10.1145/3661167.3661211acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
research-article
Open access

VulDL: Tree-based and Graph-based Neural Networks for Vulnerability Detection and Localization

Published: 18 June 2024 Publication History

Abstract

With the dramatic increase in the number and size of software in the industry, tremendous research has been studied to automatically detect vulnerabilities. However, existing detection methods have limitations in code semantic modeling and detection granularity, which makes them unable to meet the requirements of high accuracy and fine granularity at the same time. In this paper, we propose a general framework, namely VulDL, which can effectively identify whether a given code snippet has a vulnerability and locate the specific code line where the vulnerability resides. VulDL first represents the source code as a novel semantic data structure, namely the adapted code property graph. After that, tree-based and graph-based neural networks are designed, which learn features according to the hierarchies and neighborhoods, and further realize vulnerability identification and localization. Our evaluation shows that VulDL achieves F1-scores of 98.68% and 94.85% in the identification of buffer error and resource management error vulnerabilities and 97.73% on their combined vulnerabilities. On the FFmpeg+QEMU dataset, VulDL achieves an F1-score of 59.62%, which is more effective than existing methods. Besides, VulDL can locate vulnerabilities at the statement granularity with F1-scores of 97.88%, 98.31%, and 99.16% on the evaluated datasets.

References

[1]
Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2020. Deep Learning based Vulnerability Detection: Are We There Yet?. In ICSE.
[2]
Hao Chen and David A. Wagner. 2002. MOPS: an infrastructure for examining security properties of software. In CCS.
[3]
Xu Duan, Jingzheng Wu, Mengnan Du, Tianyue Luo, Mutian Yang, and Yanjun Wu. 2021. MultiCode: A Unified Code Analysis Framework based on Multi-type and Multi-granularity Semantic Learning. In ISSRE.
[4]
Xu Duan, Jingzheng Wu, Shouling Ji, Zhiqing Rui, Tianyue Luo, Mutian Yang, and Yanjun Wu. 2019. VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities. In IJCAI.
[5]
Luis Gutiérrez and Brian Keith. 2018. A systematic literature review on word embeddings. In International Conference on Software Process Improvement.
[6]
Vincent J. Hellendoorn and Premkumar T. Devanbu. 2017. Are deep neural networks the best choice for modeling source code?. In ESEC/FSE.
[7]
Secure Software Inc. 2001. Rough Audit Tool for Security. https://code.google.com/archive/p/ rough-auditing-tool-for-security/.
[8]
Jiyong Jang, Abeer Agrawal, and David Brumley. 2012. ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions. In S&P.
[9]
Qilu Jiao and Shunyao Zhang. 2021. A brief survey of word embedding and its recent development. In IAEAC.
[10]
Junae Kim, David Hubczenko, and Paul Montague. 2019. Towards attention based vulnerability discovery using source code representation. In ICANN.
[11]
Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. In S&P.
[12]
Jingyue Li and Michael D. Ernst. 2012. CBCD: Cloned buggy code detector. In ICSE.
[13]
Yi Li, Shaohua Wang, Tien N. Nguyen, and Son Van Nguyen. 2019. Improving bug detection via context-based code representation learning and attention-based neural networks. PACMPL (2019).
[14]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, Zhaoxuan Chen, Sujuan Wang, and Jialai Wang. 2018. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. TDSC (2018).
[15]
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In NDSS.
[16]
Guanjun Lin, Jun Zhang, Wei Luo, Lei Pan, and Yang Xiang. 2017. POSTER: Vulnerability Discovery with Function Representation Learning from Unlabeled Projects. In CCS 2017.
[17]
Xiang Ling, Lingfei Wu, Wei Deng, Zhenqing Qu, Jiangyu Zhang, Sheng Zhang, Tengfei Ma, Bin Wang, Chunming Wu, and Shouling Ji. 2022. MalGraph: Hierarchical Graph Neural Networks for Robust Windows Malware Detection. In INFOCOM.
[18]
Xiang Ling, Lingfei Wu, Saizhuo Wang, Tengfei Ma, Fangli Xu, Alex X Liu, Chunming Wu, and Shouling Ji. 2023. Multilevel Graph Matching Networks for Deep Graph Similarity Learning. TNNLS (2023).
[19]
Xiang Ling, Lingfei Wu, Saizhuo Wang, Gaoning Pan, Tengfei Ma, Fangli Xu, Alex X Liu, Chunming Wu, and Shouling Ji. 2021. Deep Graph Matching and Searching for Semantic Code Retrieval. TKDD (2021).
[20]
Brian W Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure (1975).
[21]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In AAAI.
[22]
NIST. [n. d.]. NATIONAL VULNERABILITY DATABASE. https://nvd.nist.gov/. Accessed 29 January 2023.
[23]
NIST. [n. d.]. NIST Softward Assurance Reference Dataset. https://samate.nist.gov/SRD/index.php
[24]
Yulei Pang, Xiaozhen Xue, and Akbar Siami Namin. 2015. Predicting Vulnerable Software Components through N-Gram Analysis and Statistical Feature Selection. In ICMLA.
[25]
Hao Peng, Lili Mou, Ge Li, Yuxuan Liu, Lu Zhang, and Zhi Jin. 2015. Building Program Vector Representations for Deep Learning. In KSEM.
[26]
Nam H. Pham, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2010. Detection of recurring software vulnerabilities. In ASE.
[27]
Rebecca L. Russell, Louis Y. Kim, Lei H. Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul M. Ellingwood, and Marc W. McConley. 2018. Automated vulnerability detection in source code using deep representation learning. In ICMLA.
[28]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR.
[29]
David A. Wheeler. 2001. Flawfinder. http://www.dwheeler.com/flawfinder.
[30]
Fabian Yamaguchi. [n. d.]. Joern. https://joern.readthedocs.io
[31]
Fabian Yamaguchi. 2015. Pattern-Based Vulnerability Discovery. Ph. D. Dissertation. University of Göttingen.
[32]
Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and Discovering Vulnerabilities with Code Property Graphs. In S&P 2014.
[33]
Fabian Yamaguchi, Christian Wressnegger, Hugo Gascon, and Konrad Rieck. 2013. Chucky: exposing missing checks in source code for vulnerability discovery. In CCS.
[34]
Hao Yu, Wing Lam, Long Chen, Ge Li, Tao Xie, and Qianxiang Wang. 2019. Neural detection of semantic code clones via tree-based convolution. In ICPC.
[35]
Yaqin Zhou, Shangqing Liu, Jing Kai Siow, Xiaoning Du, and Yang Liu. [n. d.]. FFmpeg+QEMU dataset. https://sites.google.com/view/devign
[36]
Yaqin Zhou, Shangqing Liu, Jing Kai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. In NeurIPS.

Index Terms

  1. VulDL: Tree-based and Graph-based Neural Networks for Vulnerability Detection and Localization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering
    June 2024
    728 pages
    ISBN:9798400717017
    DOI:10.1145/3661167
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2024

    Check for updates

    Author Tags

    1. Graph Neural Network
    2. Vulnerability Detection

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    EASE 2024

    Acceptance Rates

    Overall Acceptance Rate 71 of 232 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 361
      Total Downloads
    • Downloads (Last 12 months)361
    • Downloads (Last 6 weeks)47
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media