skip to main content
10.1145/3609437.3609451acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

VulD-Transformer: Source Code Vulnerability Detection via Transformer

Published:05 October 2023Publication History

ABSTRACT

The detection of software vulnerability is an important and challenging problem. Existing studies have shown that deep learning-based approaches can significantly improve the performance of vulnerability detection due to their powerful capabilities of automatic learning semantically rich code representation. However, the deep learning-based source code vulnerability detection methods still have limited learning ability for remote contextual dependency information between code statements. In this paper, we propose a deep learning-based code slice-level vulnerability detection via Transformer, dubbed VulD-Transformer, which is designed to detect vulnerabilities more effectively. In VulD-Transformer, transformer model is used to capture the critical features of vulnerabilities of long code slices. Especially, we firstly obtain code slices containing data dependencies and control dependencies by extracting the vulnerability syntax features and programs’ Program Dependency Graphs. Moreover, in order to improve the feature learning capability of the model for remote code statements, we design a Transformer-based vulnerability detection model. The experimental results on four synthetic datasets show that, compared to the VulDeePecker, SySeVR-BGRU, SySeVR-ABGRU and Russell approaches, VulD-Transformer achieves 6.12%, 8.01%, and 7.63% improvement on average in accuracy, recall and F1-measure respectively, when the code slices are more than 256 tokens. In addition, compared with these baselines, VulD-Transformer achieves 9.01%, 38.51%, and 20.98% improvement on average in accuracy, recall and F1-measure respectively on two real source code vulnerability datasets, Devign and REVEAL respectively, which are significantly higher than those of the comparison methods.

References

  1. [n. d.]. Common Vulnerabilities and Exposures. https://cve.mitre.org/Google ScholarGoogle Scholar
  2. Amritanshu Agrawal and Tim Menzies. 2018. Is "Better Data" Better Than "Better Data Miners"?. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 1050–1061. https://doi.org/10.1145/3180155.3180197Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Wenyan An, Liwei Chen, Jinxin Wang, Gewangzi Du, Gang Shi, and Dan Meng. 2020. AVDHRAM: Automated Vulnerability Detection based on Hierarchical Representation and Attention Mechanism. In 2020 IEEE Intl Conf on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking (ISPA/BDCloud/SocialCom/SustainCom). 337–344. https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00068Google ScholarGoogle ScholarCross RefCross Ref
  4. Mohammad Taneem Bin Nazim, Md Jobair Hossain Faruk, Hossain Shahriar, Md Abdullah Khan, Mohammad Masum, Nazmus Sakib, and Fan Wu. 2022. Systematic Analysis of Deep Learning Model for Vulnerable Code Detection. In 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC). 1768–1773. https://doi.org/10.1109/COMPSAC54236.2022.00281Google ScholarGoogle ScholarCross RefCross Ref
  5. Sicong Cao, Xiaobing Sun, Lili Bo, Ying Wei, and Bin Li. 2021. BGNN4VD: Constructing Bidirectional Graph Neural-Network for Vulnerability Detection. Inf. Softw. Technol. 136, C (aug 2021). https://doi.org/10.1016/j.infsof.2021.106576Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2022. Deep Learning Based Vulnerability Detection: Are We There Yet?IEEE Transactions on Software Engineering 48, 9 (2022), 3280–3296. https://doi.org/10.1109/TSE.2021.3087402Google ScholarGoogle ScholarCross RefCross Ref
  7. Xiao Deng, Wei Ye, Xie Rui, , and Shikun Zhang. 2023. Survey of Source Code Bug Detection Based on Deep Learning. Journal of Software 34, 2 (2023), 625–654. https://doi.org/10.13328/j.cnki.jos.006696Google ScholarGoogle ScholarCross RefCross Ref
  8. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarGoogle ScholarCross RefCross Ref
  9. Xu Duan, Jingzheng Wu, Shouling Ji, Zhiqing Rui, Tianyue Luo, Mutian Yang, and Yanjun Wu. 2019. VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (Macao, China) (IJCAI’19). AAAI Press, 4665–4671.Google ScholarGoogle ScholarCross RefCross Ref
  10. Adanma Cecilia Eberendu, Valentine Ikechukwu Udegbe, Edmond Onwubiko Ezennorom, Anita Chinonso Ibegbulam, and Titus Ifeanyi Chinebu. 2022. A Systematic Literature Review of Software Vulnerability Detection. European Journal of Computer Science and Information Technology 10, 1 (2022), 23–37.Google ScholarGoogle ScholarCross RefCross Ref
  11. Hantao Feng, Xiaotong Fu, Hongyu Sun, He Wang, and Yuqing Zhang. 2020. Efficient Vulnerability Detection based on abstract syntax tree and Deep Learning. In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). 722–727. https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9163061Google ScholarGoogle ScholarCross RefCross Ref
  12. Qi Feng, Chendong Feng, and Weijiang Hong. 2020. Graph Neural Network-based Vulnerability Predication. 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2020), 800–801.Google ScholarGoogle Scholar
  13. Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139Google ScholarGoogle ScholarCross RefCross Ref
  14. Seyed Mohammad Ghaffarian and Hamid Reza Shahriari. 2021. Neural software vulnerability analysis using rich intermediate graph representations of programs. Information Sciences 553 (2021), 189–207. https://doi.org/10.1016/j.ins.2020.11.053Google ScholarGoogle ScholarCross RefCross Ref
  15. Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training Code Representations with Data Flow. In 9th International Conference on Learning Representations. OpenReview.net. https://openreview.net/forum?id=jLoC4ez43PZGoogle ScholarGoogle Scholar
  16. Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, Valencia, Spain, 427–431. https://aclanthology.org/E17-2068Google ScholarGoogle ScholarCross RefCross Ref
  17. Jian Li, Pinjia He, Jieming Zhu, and Michael R. Lyu. 2017. Software Defect Prediction via Convolutional Neural Network. In 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). 318–328. https://doi.org/10.1109/QRS.2017.42Google ScholarGoogle ScholarCross RefCross Ref
  18. Yun Li, Chenlin Huang, Zhongfeng Wang, Lu Yuan, and Xiaochuan Wang. 2020. Survey of software vulnerability mining methods based on machine learning. Journal of Software 31, 7 (2020), 2040–2061.Google ScholarGoogle Scholar
  19. Yi Li, Shaohua Wang, Tien N. Nguyen, and Son Van Nguyen. 2019. Improving Bug Detection via Context-Based Code Representation Learning and Attention-Based Neural Networks. Proc. ACM Program. Lang. 3, OOPSLA, Article 162 (oct 2019), 30 pages. https://doi.org/10.1145/3360588Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2022. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing 19, 4 (2022), 2244–2258. https://doi.org/10.1109/TDSC.2021.3051525Google ScholarGoogle ScholarCross RefCross Ref
  21. Z. Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. ArXiv abs/1801.01681 (2018).Google ScholarGoogle Scholar
  22. Guanjun Lin, Jun Zhang, Wei Luo, Lei Pan, Yang Xiang, Olivier De Vel, and Paul Montague. 2018. Cross-Project Transfer Representation Learning for Vulnerable Function Discovery. IEEE Transactions on Industrial Informatics 14, 7 (2018), 3289–3297. https://doi.org/10.1109/TII.2018.2821768Google ScholarGoogle ScholarCross RefCross Ref
  23. Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. 2015. VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits(CCS ’15). Association for Computing Machinery, New York, NY, USA, 426–437. https://doi.org/10.1145/2810103.2813604Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang. 2, OOPSLA, Article 147 (oct 2018), 25 pages. https://doi.org/10.1145/3276517Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Alec Radford and Karthik Narasimhan. 2018. Improving Language Understanding by Generative Pre-Training.Google ScholarGoogle Scholar
  26. Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. 2018. Automated Vulnerability Detection in Source Code Using Deep Representation Learning. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). 757–762. https://doi.org/10.1109/ICMLA.2018.00120Google ScholarGoogle ScholarCross RefCross Ref
  27. Gongbo Tang, Mathias Müller, Annette Rios, and Rico Sennrich. 2018. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 4263–4272. https://doi.org/10.18653/v1/D18-1458Google ScholarGoogle ScholarCross RefCross Ref
  28. Tassey and Gregory. 2002. The Economic Impacts of Inadequate Infrastructure for Software Testing. National Institute of Standards and Technology (2002).Google ScholarGoogle Scholar
  29. Chandra Thapa, Seung Ick Jang, Muhammad Ejaz Ahmed, Seyit Camtepe, Josef Pieprzyk, and Surya Nepal. 2022. Transformer-Based Language Models for Software Vulnerability Detection. In Proceedings of the 38th Annual Computer Security Applications Conference (Austin, TX, USA) (ACSAC ’22). Association for Computing Machinery, New York, NY, USA, 481–496. https://doi.org/10.1145/3564625.3567985Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Haitao Wang, Jie He, Xiaohong Zhang, and Shufen Liu. 2020. A Short Text Classification Method Based on N-Gram and CNN. Chinese Journal of Electronics 29, 2 (2020), 248–254. https://doi.org/10.1049/cje.2020.01.001Google ScholarGoogle ScholarCross RefCross Ref
  32. Shizhong Wu. 2009. Review and outlook of information security vulnerability analysis. Journal of Tsinghua University (Science and Technology) 49 (2009), 2065–2072.Google ScholarGoogle Scholar
  33. Shizhong Wu, Tao Guo, Guowei Dong, and Jiajie Wang. 2012. Progress in software vulnerability analysis technology. Journal of Tsinghua University (Science and Technology) 52, 10 (2012), 1309–1319.Google ScholarGoogle Scholar
  34. Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. 2011. Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning. In Proceedings of the 5th USENIX Conference on Offensive Technologies (San Francisco, CA) (WOOT’11). USENIX Association, USA, 13.Google ScholarGoogle Scholar
  35. Xin Zhang, Hongyu Sun, Zhipeng He, MianXue Gu, Jingyu Feng, and Yuqing Zhang. 2022. VDBWGDL: Vulnerability Detection Based On Weight Graph And Deep Learning. In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). 186–190. https://doi.org/10.1109/DSN-W54100.2022.00039Google ScholarGoogle ScholarCross RefCross Ref
  36. Xin Zhou, DongGyun Han, and David Lo. 2021. Assessing Generalizability of CodeBERT. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). 425–436. https://doi.org/10.1109/ICSME52107.2021.00044Google ScholarGoogle ScholarCross RefCross Ref
  37. Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In NIPS Proceedings - Advances in Neural Information Processing Systems 32 (NIPS 2019)(Advances in Neural Information Processing Systems, Vol. 32). Neural Information Processing Systems (NIPS). https://nips.cc/Conferences/2019, https://papers.nips.cc/book/advances-in-neural-information-processing-systems-32-2019Google ScholarGoogle Scholar

Index Terms

  1. VulD-Transformer: Source Code Vulnerability Detection via Transformer

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware
          August 2023
          332 pages
          ISBN:9798400708947
          DOI:10.1145/3609437

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 October 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate55of111submissions,50%
        • Article Metrics

          • Downloads (Last 12 months)124
          • Downloads (Last 6 weeks)29

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format