skip to main content
10.1145/3609437.3609451acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

VulD-Transformer: Source Code Vulnerability Detection via Transformer

Published: 05 October 2023 Publication History

Abstract

The detection of software vulnerability is an important and challenging problem. Existing studies have shown that deep learning-based approaches can significantly improve the performance of vulnerability detection due to their powerful capabilities of automatic learning semantically rich code representation. However, the deep learning-based source code vulnerability detection methods still have limited learning ability for remote contextual dependency information between code statements. In this paper, we propose a deep learning-based code slice-level vulnerability detection via Transformer, dubbed VulD-Transformer, which is designed to detect vulnerabilities more effectively. In VulD-Transformer, transformer model is used to capture the critical features of vulnerabilities of long code slices. Especially, we firstly obtain code slices containing data dependencies and control dependencies by extracting the vulnerability syntax features and programs’ Program Dependency Graphs. Moreover, in order to improve the feature learning capability of the model for remote code statements, we design a Transformer-based vulnerability detection model. The experimental results on four synthetic datasets show that, compared to the VulDeePecker, SySeVR-BGRU, SySeVR-ABGRU and Russell approaches, VulD-Transformer achieves 6.12%, 8.01%, and 7.63% improvement on average in accuracy, recall and F1-measure respectively, when the code slices are more than 256 tokens. In addition, compared with these baselines, VulD-Transformer achieves 9.01%, 38.51%, and 20.98% improvement on average in accuracy, recall and F1-measure respectively on two real source code vulnerability datasets, Devign and REVEAL respectively, which are significantly higher than those of the comparison methods.

References

[1]
[n. d.]. Common Vulnerabilities and Exposures. https://cve.mitre.org/
[2]
Amritanshu Agrawal and Tim Menzies. 2018. Is "Better Data" Better Than "Better Data Miners"?. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 1050–1061. https://doi.org/10.1145/3180155.3180197
[3]
Wenyan An, Liwei Chen, Jinxin Wang, Gewangzi Du, Gang Shi, and Dan Meng. 2020. AVDHRAM: Automated Vulnerability Detection based on Hierarchical Representation and Attention Mechanism. In 2020 IEEE Intl Conf on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking (ISPA/BDCloud/SocialCom/SustainCom). 337–344. https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00068
[4]
Mohammad Taneem Bin Nazim, Md Jobair Hossain Faruk, Hossain Shahriar, Md Abdullah Khan, Mohammad Masum, Nazmus Sakib, and Fan Wu. 2022. Systematic Analysis of Deep Learning Model for Vulnerable Code Detection. In 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC). 1768–1773. https://doi.org/10.1109/COMPSAC54236.2022.00281
[5]
Sicong Cao, Xiaobing Sun, Lili Bo, Ying Wei, and Bin Li. 2021. BGNN4VD: Constructing Bidirectional Graph Neural-Network for Vulnerability Detection. Inf. Softw. Technol. 136, C (aug 2021). https://doi.org/10.1016/j.infsof.2021.106576
[6]
Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2022. Deep Learning Based Vulnerability Detection: Are We There Yet?IEEE Transactions on Software Engineering 48, 9 (2022), 3280–3296. https://doi.org/10.1109/TSE.2021.3087402
[7]
Xiao Deng, Wei Ye, Xie Rui, and Shikun Zhang. 2023. Survey of Source Code Bug Detection Based on Deep Learning. Journal of Software 34, 2 (2023), 625–654. https://doi.org/10.13328/j.cnki.jos.006696
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
[9]
Xu Duan, Jingzheng Wu, Shouling Ji, Zhiqing Rui, Tianyue Luo, Mutian Yang, and Yanjun Wu. 2019. VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (Macao, China) (IJCAI’19). AAAI Press, 4665–4671.
[10]
Adanma Cecilia Eberendu, Valentine Ikechukwu Udegbe, Edmond Onwubiko Ezennorom, Anita Chinonso Ibegbulam, and Titus Ifeanyi Chinebu. 2022. A Systematic Literature Review of Software Vulnerability Detection. European Journal of Computer Science and Information Technology 10, 1 (2022), 23–37.
[11]
Hantao Feng, Xiaotong Fu, Hongyu Sun, He Wang, and Yuqing Zhang. 2020. Efficient Vulnerability Detection based on abstract syntax tree and Deep Learning. In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). 722–727. https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9163061
[12]
Qi Feng, Chendong Feng, and Weijiang Hong. 2020. Graph Neural Network-based Vulnerability Predication. 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2020), 800–801.
[13]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
[14]
Seyed Mohammad Ghaffarian and Hamid Reza Shahriari. 2021. Neural software vulnerability analysis using rich intermediate graph representations of programs. Information Sciences 553 (2021), 189–207. https://doi.org/10.1016/j.ins.2020.11.053
[15]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training Code Representations with Data Flow. In 9th International Conference on Learning Representations. OpenReview.net. https://openreview.net/forum?id=jLoC4ez43PZ
[16]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, Valencia, Spain, 427–431. https://aclanthology.org/E17-2068
[17]
Jian Li, Pinjia He, Jieming Zhu, and Michael R. Lyu. 2017. Software Defect Prediction via Convolutional Neural Network. In 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). 318–328. https://doi.org/10.1109/QRS.2017.42
[18]
Yun Li, Chenlin Huang, Zhongfeng Wang, Lu Yuan, and Xiaochuan Wang. 2020. Survey of software vulnerability mining methods based on machine learning. Journal of Software 31, 7 (2020), 2040–2061.
[19]
Yi Li, Shaohua Wang, Tien N. Nguyen, and Son Van Nguyen. 2019. Improving Bug Detection via Context-Based Code Representation Learning and Attention-Based Neural Networks. Proc. ACM Program. Lang. 3, OOPSLA, Article 162 (oct 2019), 30 pages. https://doi.org/10.1145/3360588
[20]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2022. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing 19, 4 (2022), 2244–2258. https://doi.org/10.1109/TDSC.2021.3051525
[21]
Z. Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. ArXiv abs/1801.01681 (2018).
[22]
Guanjun Lin, Jun Zhang, Wei Luo, Lei Pan, Yang Xiang, Olivier De Vel, and Paul Montague. 2018. Cross-Project Transfer Representation Learning for Vulnerable Function Discovery. IEEE Transactions on Industrial Informatics 14, 7 (2018), 3289–3297. https://doi.org/10.1109/TII.2018.2821768
[23]
Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. 2015. VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits(CCS ’15). Association for Computing Machinery, New York, NY, USA, 426–437. https://doi.org/10.1145/2810103.2813604
[24]
Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang. 2, OOPSLA, Article 147 (oct 2018), 25 pages. https://doi.org/10.1145/3276517
[25]
Alec Radford and Karthik Narasimhan. 2018. Improving Language Understanding by Generative Pre-Training.
[26]
Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. 2018. Automated Vulnerability Detection in Source Code Using Deep Representation Learning. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). 757–762. https://doi.org/10.1109/ICMLA.2018.00120
[27]
Gongbo Tang, Mathias Müller, Annette Rios, and Rico Sennrich. 2018. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 4263–4272. https://doi.org/10.18653/v1/D18-1458
[28]
Tassey and Gregory. 2002. The Economic Impacts of Inadequate Infrastructure for Software Testing. National Institute of Standards and Technology (2002).
[29]
Chandra Thapa, Seung Ick Jang, Muhammad Ejaz Ahmed, Seyit Camtepe, Josef Pieprzyk, and Surya Nepal. 2022. Transformer-Based Language Models for Software Vulnerability Detection. In Proceedings of the 38th Annual Computer Security Applications Conference (Austin, TX, USA) (ACSAC ’22). Association for Computing Machinery, New York, NY, USA, 481–496. https://doi.org/10.1145/3564625.3567985
[30]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.
[31]
Haitao Wang, Jie He, Xiaohong Zhang, and Shufen Liu. 2020. A Short Text Classification Method Based on N-Gram and CNN. Chinese Journal of Electronics 29, 2 (2020), 248–254. https://doi.org/10.1049/cje.2020.01.001
[32]
Shizhong Wu. 2009. Review and outlook of information security vulnerability analysis. Journal of Tsinghua University (Science and Technology) 49 (2009), 2065–2072.
[33]
Shizhong Wu, Tao Guo, Guowei Dong, and Jiajie Wang. 2012. Progress in software vulnerability analysis technology. Journal of Tsinghua University (Science and Technology) 52, 10 (2012), 1309–1319.
[34]
Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. 2011. Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning. In Proceedings of the 5th USENIX Conference on Offensive Technologies (San Francisco, CA) (WOOT’11). USENIX Association, USA, 13.
[35]
Xin Zhang, Hongyu Sun, Zhipeng He, MianXue Gu, Jingyu Feng, and Yuqing Zhang. 2022. VDBWGDL: Vulnerability Detection Based On Weight Graph And Deep Learning. In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). 186–190. https://doi.org/10.1109/DSN-W54100.2022.00039
[36]
Xin Zhou, DongGyun Han, and David Lo. 2021. Assessing Generalizability of CodeBERT. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). 425–436. https://doi.org/10.1109/ICSME52107.2021.00044
[37]
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In NIPS Proceedings - Advances in Neural Information Processing Systems 32 (NIPS 2019)(Advances in Neural Information Processing Systems, Vol. 32). Neural Information Processing Systems (NIPS). https://nips.cc/Conferences/2019, https://papers.nips.cc/book/advances-in-neural-information-processing-systems-32-2019

Cited By

View all
  • (2025)Question–Answer Methodology for Vulnerable Source Code Review via Prototype-Based Model-Agnostic Meta-LearningFuture Internet10.3390/fi1701003317:1(33)Online publication date: 14-Jan-2025
  • (2025)Large language models for software vulnerability detection: a guide for researchers on models, methods, techniques, datasets, and metricsInternational Journal of Information Security10.1007/s10207-025-00992-724:2Online publication date: 1-Apr-2025
  • (2024)Enhancing Deep Learning Vulnerability Detection through Imbalance Loss Functions: An Empirical StudyProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671379(85-94)Online publication date: 24-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware
August 2023
332 pages
ISBN:9798400708947
DOI:10.1145/3609437
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Vulnerabilities detection
  2. control dependencies
  3. data dependencies
  4. deep learning
  5. transformer

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

Internetware 2023

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)141
  • Downloads (Last 6 weeks)9
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Question–Answer Methodology for Vulnerable Source Code Review via Prototype-Based Model-Agnostic Meta-LearningFuture Internet10.3390/fi1701003317:1(33)Online publication date: 14-Jan-2025
  • (2025)Large language models for software vulnerability detection: a guide for researchers on models, methods, techniques, datasets, and metricsInternational Journal of Information Security10.1007/s10207-025-00992-724:2Online publication date: 1-Apr-2025
  • (2024)Enhancing Deep Learning Vulnerability Detection through Imbalance Loss Functions: An Empirical StudyProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671379(85-94)Online publication date: 24-Jul-2024
  • (2024)Integrating Artificial Open Generative Artificial Intelligence into Software Supply Chain Security2024 5th International Conference on Data Analytics for Business and Industry (ICDABI)10.1109/ICDABI63787.2024.10800301(200-206)Online publication date: 23-Oct-2024
  • (2024)Detecting Vulnerabilities via Explicitly Leveraging Vulnerability Features on Program SlicesTheoretical Aspects of Software Engineering10.1007/978-3-031-64626-3_10(165-185)Online publication date: 14-Jul-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media