research-article

VulD-Transformer: Source Code Vulnerability Detection via Transformer

Authors:

Boyang XiaoAuthors Info & Claims

Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware

Pages 185 - 193

https://doi.org/10.1145/3609437.3609451

Published: 05 October 2023 Publication History

Abstract

The detection of software vulnerability is an important and challenging problem. Existing studies have shown that deep learning-based approaches can significantly improve the performance of vulnerability detection due to their powerful capabilities of automatic learning semantically rich code representation. However, the deep learning-based source code vulnerability detection methods still have limited learning ability for remote contextual dependency information between code statements. In this paper, we propose a deep learning-based code slice-level vulnerability detection via Transformer, dubbed VulD-Transformer, which is designed to detect vulnerabilities more effectively. In VulD-Transformer, transformer model is used to capture the critical features of vulnerabilities of long code slices. Especially, we firstly obtain code slices containing data dependencies and control dependencies by extracting the vulnerability syntax features and programs’ Program Dependency Graphs. Moreover, in order to improve the feature learning capability of the model for remote code statements, we design a Transformer-based vulnerability detection model. The experimental results on four synthetic datasets show that, compared to the VulDeePecker, SySeVR-BGRU, SySeVR-ABGRU and Russell approaches, VulD-Transformer achieves 6.12%, 8.01%, and 7.63% improvement on average in accuracy, recall and F1-measure respectively, when the code slices are more than 256 tokens. In addition, compared with these baselines, VulD-Transformer achieves 9.01%, 38.51%, and 20.98% improvement on average in accuracy, recall and F1-measure respectively on two real source code vulnerability datasets, Devign and REVEAL respectively, which are significantly higher than those of the comparison methods.

References

[1]

[n. d.]. Common Vulnerabilities and Exposures. https://cve.mitre.org/

[2]

Amritanshu Agrawal and Tim Menzies. 2018. Is "Better Data" Better Than "Better Data Miners"?. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 1050–1061. https://doi.org/10.1145/3180155.3180197

Digital Library

[3]

Wenyan An, Liwei Chen, Jinxin Wang, Gewangzi Du, Gang Shi, and Dan Meng. 2020. AVDHRAM: Automated Vulnerability Detection based on Hierarchical Representation and Attention Mechanism. In 2020 IEEE Intl Conf on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking (ISPA/BDCloud/SocialCom/SustainCom). 337–344. https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00068

[4]

Mohammad Taneem Bin Nazim, Md Jobair Hossain Faruk, Hossain Shahriar, Md Abdullah Khan, Mohammad Masum, Nazmus Sakib, and Fan Wu. 2022. Systematic Analysis of Deep Learning Model for Vulnerable Code Detection. In 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC). 1768–1773. https://doi.org/10.1109/COMPSAC54236.2022.00281

[5]

Sicong Cao, Xiaobing Sun, Lili Bo, Ying Wei, and Bin Li. 2021. BGNN4VD: Constructing Bidirectional Graph Neural-Network for Vulnerability Detection. Inf. Softw. Technol. 136, C (aug 2021). https://doi.org/10.1016/j.infsof.2021.106576

Digital Library

[6]

Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2022. Deep Learning Based Vulnerability Detection: Are We There Yet?IEEE Transactions on Software Engineering 48, 9 (2022), 3280–3296. https://doi.org/10.1109/TSE.2021.3087402

[7]

Xiao Deng, Wei Ye, Xie Rui, and Shikun Zhang. 2023. Survey of Source Code Bug Detection Based on Deep Learning. Journal of Software 34, 2 (2023), 625–654. https://doi.org/10.13328/j.cnki.jos.006696

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423

[9]

Xu Duan, Jingzheng Wu, Shouling Ji, Zhiqing Rui, Tianyue Luo, Mutian Yang, and Yanjun Wu. 2019. VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (Macao, China) (IJCAI’19). AAAI Press, 4665–4671.

[10]

Adanma Cecilia Eberendu, Valentine Ikechukwu Udegbe, Edmond Onwubiko Ezennorom, Anita Chinonso Ibegbulam, and Titus Ifeanyi Chinebu. 2022. A Systematic Literature Review of Software Vulnerability Detection. European Journal of Computer Science and Information Technology 10, 1 (2022), 23–37.

[11]

Hantao Feng, Xiaotong Fu, Hongyu Sun, He Wang, and Yuqing Zhang. 2020. Efficient Vulnerability Detection based on abstract syntax tree and Deep Learning. In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). 722–727. https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9163061

[12]

Qi Feng, Chendong Feng, and Weijiang Hong. 2020. Graph Neural Network-based Vulnerability Predication. 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2020), 800–801.

[13]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139

[14]

Seyed Mohammad Ghaffarian and Hamid Reza Shahriari. 2021. Neural software vulnerability analysis using rich intermediate graph representations of programs. Information Sciences 553 (2021), 189–207. https://doi.org/10.1016/j.ins.2020.11.053

[15]

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training Code Representations with Data Flow. In 9th International Conference on Learning Representations. OpenReview.net. https://openreview.net/forum?id=jLoC4ez43PZ

[16]

Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, Valencia, Spain, 427–431. https://aclanthology.org/E17-2068

[17]

Jian Li, Pinjia He, Jieming Zhu, and Michael R. Lyu. 2017. Software Defect Prediction via Convolutional Neural Network. In 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). 318–328. https://doi.org/10.1109/QRS.2017.42

[18]

Yun Li, Chenlin Huang, Zhongfeng Wang, Lu Yuan, and Xiaochuan Wang. 2020. Survey of software vulnerability mining methods based on machine learning. Journal of Software 31, 7 (2020), 2040–2061.

[19]

Yi Li, Shaohua Wang, Tien N. Nguyen, and Son Van Nguyen. 2019. Improving Bug Detection via Context-Based Code Representation Learning and Attention-Based Neural Networks. Proc. ACM Program. Lang. 3, OOPSLA, Article 162 (oct 2019), 30 pages. https://doi.org/10.1145/3360588

Digital Library

[20]

Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2022. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing 19, 4 (2022), 2244–2258. https://doi.org/10.1109/TDSC.2021.3051525

[21]

Z. Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. ArXiv abs/1801.01681 (2018).

[22]

Guanjun Lin, Jun Zhang, Wei Luo, Lei Pan, Yang Xiang, Olivier De Vel, and Paul Montague. 2018. Cross-Project Transfer Representation Learning for Vulnerable Function Discovery. IEEE Transactions on Industrial Informatics 14, 7 (2018), 3289–3297. https://doi.org/10.1109/TII.2018.2821768

[23]

Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. 2015. VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits(CCS ’15). Association for Computing Machinery, New York, NY, USA, 426–437. https://doi.org/10.1145/2810103.2813604

Digital Library

[24]

Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang. 2, OOPSLA, Article 147 (oct 2018), 25 pages. https://doi.org/10.1145/3276517

Digital Library

[25]

Alec Radford and Karthik Narasimhan. 2018. Improving Language Understanding by Generative Pre-Training.

[26]

Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. 2018. Automated Vulnerability Detection in Source Code Using Deep Representation Learning. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). 757–762. https://doi.org/10.1109/ICMLA.2018.00120

[27]

Gongbo Tang, Mathias Müller, Annette Rios, and Rico Sennrich. 2018. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 4263–4272. https://doi.org/10.18653/v1/D18-1458

[28]

Tassey and Gregory. 2002. The Economic Impacts of Inadequate Infrastructure for Software Testing. National Institute of Standards and Technology (2002).

[29]

Chandra Thapa, Seung Ick Jang, Muhammad Ejaz Ahmed, Seyit Camtepe, Josef Pieprzyk, and Surya Nepal. 2022. Transformer-Based Language Models for Software Vulnerability Detection. In Proceedings of the 38th Annual Computer Security Applications Conference (Austin, TX, USA) (ACSAC ’22). Association for Computing Machinery, New York, NY, USA, 481–496. https://doi.org/10.1145/3564625.3567985

Digital Library

[30]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.

Digital Library

[31]

Haitao Wang, Jie He, Xiaohong Zhang, and Shufen Liu. 2020. A Short Text Classification Method Based on N-Gram and CNN. Chinese Journal of Electronics 29, 2 (2020), 248–254. https://doi.org/10.1049/cje.2020.01.001

[32]

Shizhong Wu. 2009. Review and outlook of information security vulnerability analysis. Journal of Tsinghua University (Science and Technology) 49 (2009), 2065–2072.

[33]

Shizhong Wu, Tao Guo, Guowei Dong, and Jiajie Wang. 2012. Progress in software vulnerability analysis technology. Journal of Tsinghua University (Science and Technology) 52, 10 (2012), 1309–1319.

[34]

Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. 2011. Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning. In Proceedings of the 5th USENIX Conference on Offensive Technologies (San Francisco, CA) (WOOT’11). USENIX Association, USA, 13.

[35]

Xin Zhang, Hongyu Sun, Zhipeng He, MianXue Gu, Jingyu Feng, and Yuqing Zhang. 2022. VDBWGDL: Vulnerability Detection Based On Weight Graph And Deep Learning. In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). 186–190. https://doi.org/10.1109/DSN-W54100.2022.00039

[36]

Xin Zhou, DongGyun Han, and David Lo. 2021. Assessing Generalizability of CodeBERT. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). 425–436. https://doi.org/10.1109/ICSME52107.2021.00044

[37]

Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In NIPS Proceedings - Advances in Neural Information Processing Systems 32 (NIPS 2019)(Advances in Neural Information Processing Systems, Vol. 32). Neural Information Processing Systems (NIPS). https://nips.cc/Conferences/2019, https://papers.nips.cc/book/advances-in-neural-information-processing-systems-32-2019

Cited By

Corona-Fraga PHernandez-Suarez ASanchez-Perez GToscano-Medina LPerez-Meana HPortillo-Portillo JOlivares-Mercado JGarcía Villalba L(2025)Question–Answer Methodology for Vulnerable Source Code Review via Prototype-Based Model-Agnostic Meta-LearningFuture Internet10.3390/fi1701003317:1(33)Online publication date: 14-Jan-2025
https://doi.org/10.3390/fi17010033
Taghavi Far SFeyzi F(2025)Large language models for software vulnerability detection: a guide for researchers on models, methods, techniques, datasets, and metricsInternational Journal of Information Security10.1007/s10207-025-00992-724:2Online publication date: 1-Apr-2025
https://dl.acm.org/doi/10.1007/s10207-025-00992-7
He YLin GMa XKeung JTan CHu WLi F(2024)Enhancing Deep Learning Vulnerability Detection through Imbalance Loss Functions: An Empirical StudyProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671379(85-94)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671379
Show More Cited By

Index Terms

VulD-Transformer: Source Code Vulnerability Detection via Transformer
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation
  2. Software and application security
    1. Web application security

Recommendations

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection
RAID '23: Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses

We propose and release a new vulnerable source code dataset. We curate the dataset by crawling security issue websites, extracting vulnerability-fixing commits and source codes from the corresponding projects. Our new dataset contains 18,945 vulnerable ...
Learning-based Vulnerability Detection in Binary Code
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and Computing

Cyberattacks typically exploit software vulnerabilities to compromise computers and smart devices. To address vulnerabilities, many approaches have been developed to detect vulnerabilities using deep learning. However, most learning-based approaches ...
Poison Attack and Poison Detection on Deep Source Code Processing Models
In the software engineering (SE) community, deep learning (DL) has recently been applied to many source code processing tasks, achieving state-of-the-art results. Due to the poor interpretability of DL models, their security vulnerabilities require ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware

August 2023

332 pages

ISBN:9798400708947

DOI:10.1145/3609437

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Natural Science Foundation of Gansu Province
National Natural Science Foundation of China
Gansu Provincial Department of Education industry support project
State Grid Technology Project

Conference

Internetware 2023

Internetware 2023: 14th Asia-Pacific Symposium on Internetware

August 4 - 6, 2023

Hangzhou, China

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
219
Total Downloads

Downloads (Last 12 months)141
Downloads (Last 6 weeks)9

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Corona-Fraga PHernandez-Suarez ASanchez-Perez GToscano-Medina LPerez-Meana HPortillo-Portillo JOlivares-Mercado JGarcía Villalba L(2025)Question–Answer Methodology for Vulnerable Source Code Review via Prototype-Based Model-Agnostic Meta-LearningFuture Internet10.3390/fi1701003317:1(33)Online publication date: 14-Jan-2025
https://doi.org/10.3390/fi17010033
Taghavi Far SFeyzi F(2025)Large language models for software vulnerability detection: a guide for researchers on models, methods, techniques, datasets, and metricsInternational Journal of Information Security10.1007/s10207-025-00992-724:2Online publication date: 1-Apr-2025
https://dl.acm.org/doi/10.1007/s10207-025-00992-7
He YLin GMa XKeung JTan CHu WLi F(2024)Enhancing Deep Learning Vulnerability Detection through Imbalance Loss Functions: An Empirical StudyProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671379(85-94)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671379
Alevizos VPapakostas GSimasiku AMalliarou DMessinis AEdralin SXu CYue Z(2024)Integrating Artificial Open Generative Artificial Intelligence into Software Supply Chain Security2024 5th International Conference on Data Analytics for Business and Industry (ICDABI)10.1109/ICDABI63787.2024.10800301(200-206)Online publication date: 23-Oct-2024
https://doi.org/10.1109/ICDABI63787.2024.10800301
Guo HZhang XZhang ZShen Y(2024)Detecting Vulnerabilities via Explicitly Leveraging Vulnerability Features on Program SlicesTheoretical Aspects of Software Engineering10.1007/978-3-031-64626-3_10(165-185)Online publication date: 14-Jul-2024
https://doi.org/10.1007/978-3-031-64626-3_10

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten