Skip to main content

Software Vulnerability Detection via Multimodal Deep Learning

  • Conference paper
  • First Online:
Security and Trust Management (STM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13867))

Included in the following conference series:

  • 293 Accesses

Abstract

Vulnerabilities in software are like ticking time bombs, but it is difficult to completely eliminate them. For example, buffer overflow is a quite common vulnerability that occurs when a program receives too much data that can corrupt nearby space in memory and manipulate other data for malicious actions. To detect potential vulnerabilities in source code, we consider the code as multisource data by extracting semantically meaningful sub-graphs: Abstract Syntax Tree Graph (ASTG) and Tokenized Data Flow Graph (TDFG). We combine these with the original sequence of tokens and 49 heuristic features to train and leverage a multimodal deep learning network to detect vulnerable statements. We propose a Multisource Deep Learner (MDL) with joint representations based on the pretrained attention-based Bidirectional Gated Recurrent Unit (BGRU) neural networks for vulnerability detection in source code. Our framework not only detects potential vulnerabilities but also locates and ranks the vulnerable statements according to their importance based on the Program Dependence Graph (PDG). Our results show that an MDL-based model using multiple modalities is significantly better than a single modality based model. We also present comparisons with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alon, U., Brody, S., Levy, O., Yahav, E.: Code2seq: generating sequences from structured representations of code. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=H1gKYo09tX

  2. Alon, U., Zilberstein, M., Levy, O., Yahav, E.: Code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 3(POPL) (2019). https://doi.org/10.1145/3290353

  3. Chandar, S., Khapra, M.M., Larochelle, H., Ravindran, B.: Correlational neural networks. Neural Comput. 28(2), 257–285 (2016). https://doi.org/10.1162/NECO_a_00801

    Article  MathSciNet  MATH  Google Scholar 

  4. Chernis, B., Verma, R.: Machine learning methods for software vulnerability detection. In: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics, pp. 31–39 (2018)

    Google Scholar 

  5. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, December 2014 (2014)

    Google Scholar 

  6. Cooper, A., Zhou, X., Heidbrink, S., Dunlavy, D.M.: Using neural architecture search for improving software flaw detection in multimodal deep learning models. arXiv:2009.10644 (2020)

  7. Eliben: Complete c99 parser in pure python: pycparser v2.21. https://github.com/eliben/pycparser/blob/master/pycparser. Accessed Nov 2021

  8. Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. (TOPLAS) 9(3), 319–349 (1987). https://doi.org/10.1145/24039.24041

    Article  MATH  Google Scholar 

  9. Flawfinder: Flawfinder. https://dwheeler.com/flawfinder/. Accessed Feb 2022

  10. SQ Group: Static analysis tool exposition (SATE) VI workshop. https://www.nist.gov/itl/ssd/software-quality-group/static-analysis-tool-exposition-sate-vi-workshop. Accessed Mar 2022

  11. Harer, J.A., et al.: Automated software vulnerability detection with machine learning. arXiv abs/1803.04497 (2018)

    Google Scholar 

  12. Heidbrink, S., Rodhouse, K.N., Dunlavy, D.M.: Multimodal deep learning for flaw detection in software programs. arXiv:2009.04549 (2020)

  13. Heidbrink, S., Rodhouse, K.N., Dunlavy, D., Cooper, A., Zhou, X.: Joint analysis of program data representations using machine learning for improved software assurance and development capabilities (2020). https://doi.org/10.2172/1670527. https://www.osti.gov/biblio/1670527

  14. Hicken, A.: The shift-left approach to software testing. https://www.stickyminds.com/article/shift-left-approach-software-testing. Accessed Mar 2022

  15. Jin, A., Fu, Q., Deng, Z.: Contour-based 3D modeling through joint embedding of shapes and contours. In: Symposium on Interactive 3D Graphics and Games, I3D 2020. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3384382.3384518

  16. Katz, O., Olshaker, Y., Goldberg, Y., Yahav, E.: Towards neural decompilation. arXiv abs/1905.08325 (2019)

    Google Scholar 

  17. Kotenko, I., Izrailov, K., Buinevich, M.: Static analysis of information systems for IoT cyber security: a survey of machine learning approaches. Sensors 22(4) (2022). https://doi.org/10.3390/s22041335. https://www.mdpi.com/1424-8220/22/4/1335

  18. Kovalenko, V., Bogomolov, E., Bryksin, T., Bacchelli, A.: PathMiner: a library for mining of path-based representations of code. In: Proceedings of the 16th International Conference on Mining Software Repositories, pp. 13–17. IEEE Press (2019)

    Google Scholar 

  19. Kulenovic, M., Donko, D.: A survey of static code analysis methods for security vulnerabilities detection. In: 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1381–1386 (2014). https://doi.org/10.1109/MIPRO.2014.6859783

  20. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation, pp. 1817–1824 (2011). https://doi.org/10.1109/ICRA.2011.5980382

  21. Li, Y., Wang, S., Nguyen, T.N.: Vulnerability detection with fine-grained interpretations, pp. 292–303. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3468264.3468597

  22. Li, Z., Zou, D., Xu, S., Chen, Z., Zhu, Y., Jin, H.: VulDeeLocator: a deep learning-based fine-grained vulnerability detector. IEEE Trans. Dependable Secure Comput. 19(4), 2821–2837 (2022). https://doi.org/10.1109/TDSC.2021.3076142

    Article  Google Scholar 

  23. Li, Z., Zou, D., Xu, S., Jin, H., Zhu, Y., Chen, Z.: SySeVR: a framework for using deep learning to detect software vulnerabilities. IEEE Trans. Dependable Secure Comput. 1 (2021). https://doi.org/10.1109/tdsc.2021.3051525

  24. Li, Z., et al.: VulDeePecker: a deep learning-based system for vulnerability detection. In: 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, 18–21 February 2018. The Internet Society (2018). http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018_03A-2_Li_paper.pdf

  25. McConnell, S.: Code Complete. Pearson Education (2004)

    Google Scholar 

  26. Mokhov, S.A.: The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT. arXiv, Cryptography and Security (2011)

    Google Scholar 

  27. Mokhov, S.A., Paquet, J., Debbabi, M.: MARFCAT: fast code analysis for defects and vulnerabilities. In: 2015 IEEE 1st International Workshop on Software Analytics (SWAN), pp. 35–38 (2015). https://doi.org/10.1109/SWAN.2015.7070488

  28. Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., Jaiswal, S.: Graph2vec: learning distributed representations of graphs. arXiv abs/1707.05005 (2017)

    Google Scholar 

  29. NIST: Software assurance reference dataset. https://samate.nist.gov/SRD/index.php. Accessed Mar 2022

  30. NIST: National vulnerability database. https://nvd.nist.gov/. Accessed Nov 2021

  31. RAT: rough-auditing-tool-for-security. https://code.google.com/archive/p/rough-auditing-tool-for-security/. Accessed May 2022

  32. Reimers, N., Gurevych, I.: Reporting score distributions makes a difference: performance study of LSTM-networks for sequence tagging. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 338–348. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/D17-1035. https://aclanthology.org/D17-1035

  33. Russell, R., et al.: Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 757–762 (2018). https://doi.org/10.1109/ICMLA.2018.00120

  34. Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093

    Article  Google Scholar 

  35. Sestili, C.D., Snavely, W., VanHoudnos, N.M.: Towards security defect prediction with AI. arXiv abs/1808.09897 (2018)

    Google Scholar 

  36. Sharma, T., Kechagia, M., Georgiou, S., Tiwari, R., Sarro, F.: A survey on machine learning techniques for source code analysis. arXiv abs/2110.09610 (2021)

    Google Scholar 

  37. Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-lehman graph kernels. J. Mach. Learn. Res. 12(77), 2539–2561 (2011). http://jmlr.org/papers/v12/shervashidze11a.html

  38. Wang, Z., Yu, L., Wang, S., Liu, P.: Spotting silent buffer overflows in execution trace through graph neural network assisted data flow analysis. arXiv (2021). https://arxiv.org/abs/2102.10452

  39. Wanjia: This 66-year-old is still writing code and wants to fix bugs early in the SDLC. https://xcalibyte.com/. Accessed Mar 2022

  40. Weiser, M.: Program slicing. IEEE Trans. Softw. Eng. SE-10(4), 352–357 (1984). https://doi.org/10.1109/TSE.1984.5010248

  41. Yamaguchi, F., Golde, N., Arp, D., Rieck, K.: Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE Symposium on Security and Privacy, pp. 590–604 (2014). https://doi.org/10.1109/SP.2014.44

  42. Zhou, X., Verma, R.M.: Vulnerability detection via multimodal learning: datasets and analysis. In: ASIA Conference on Computer and Communications Security (2022). https://doi.org/10.1145/3488932.3527288

Download references

Acknowledgments

Research partially supported by NSF grants 1433817 and 2210198, ARO grant W911NF-20-1- 0254, and ONR award N00014-19-S-F009. Verma is the founder of Everest Cyber Security and Analytics, Inc.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xin Zhou or Rakesh M. Verma .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Limitations

Apart from the usual limitations of static analysis and machine learning, other limitations are: 1) adversarial data may negatively impact model’s performance, 2) the current implementation does not address interprocedural analysis.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, X., Verma, R.M. (2023). Software Vulnerability Detection via Multimodal Deep Learning. In: Lenzini, G., Meng, W. (eds) Security and Trust Management. STM 2022. Lecture Notes in Computer Science, vol 13867. Springer, Cham. https://doi.org/10.1007/978-3-031-29504-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-29504-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-29503-4

  • Online ISBN: 978-3-031-29504-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics