Skip to main content

Vulnerability Detection Using Deep Learning Based Function Classification

  • Conference paper
  • First Online:
Network and System Security (NSS 2022)

Abstract

Software vulnerabilities are becoming increasingly severe problems, which can pose great risks of information leakage, denial of service, or even system crashes. However, their detection is still formidable, due to the diverse forms of software development and the diverse programming styles of software developers. In this paper, we propose a vulnerability detection tool to explore security issues in source code using deep learning-based function classification. Specifically, we first extract and parse function prototypes in the source code. Then, with the help of the pre-built corpus, we split the function prototypes into segmentations with semantic meanings. By utilizing deep learning-based classifiers, the segmentations are further classified into seven categories. Finally, we use static scanning analyzers to separately detect vulnerabilities of different types of functions. Additionally, the experimental results show that the proposed method can effectively and efficiently distinguish vulnerabilities in the benchmark source code (5 of 7 memory corruptions, 13 of 18 cryptography vulnerabilities, 5 of 6 data processing errors, and 13 of 18 random number issues).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apple: Clang Static Analyzer. https://clang-analyzer.llvm.org/

  2. Atwood, J., Spolsky, J.: Stack overflow. https://stackoverflow.com/

  3. Corporation, C.P.B.: The Linux Kernel Archives. https://www.kernel.org/

  4. Gens, D., Schmitt, S., Davi, L., Sadeghi, A.R.: K-Miner: Uncovering memory corruption in Linux. In: Network and Distributed System Security Symposium (2018)

    Google Scholar 

  5. Google: Google Web Trillion Word Corpus. https://ai.googleblog.com/2006/08/all-our-n-gram-are-belong-to-you.html

  6. Grieco, G., Grinblat, G.L., Uzal, L.C., Rawat, S., Feist, J., Mounier, L.: Toward large-scale vulnerability discovery using machine learning. In: ACM Conference on Data and Application Security and Privacy (2016)

    Google Scholar 

  7. Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. arXiv preprint arXiv:2007.15779 (2020)

  8. Gu, Z., Wu, J., Li, C., Zhou, M., Gu, M.: SSLDoc: automatically diagnosing incorrect SSL API Usages in C Programs. In: International Conference on Software Engineering and Knowledge Engineering (2019)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)

  10. Huo, X., Li, M., Zhou, Z.: Learning unified features from natural and programming languages for locating buggy source code. In: International Joint Conference on Artificial Intelligence (2016)

    Google Scholar 

  11. Jenks, G.: Python word segmentation. https://pypi.org/project/wordsegment/

  12. Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Annual Meeting of the Association for Computational Linguistics (2017)

    Google Scholar 

  13. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Conference of the European Chapter of the Association for Computational Linguistics (2017)

    Google Scholar 

  14. Kim, S., Woo, S., Lee, H., Oh, H.: VUDDY: a scalable approach for vulnerable code clone discovery. In: IEEE Symposium on Security and Privacy (2017)

    Google Scholar 

  15. Kim, Y.: Convolutional neural networks for sentence classification. In: Conference on Empirical Methods in Natural Language Processing (2014)

    Google Scholar 

  16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)

    Google Scholar 

  17. Kroeger, T., Timofte, R., Dai, D., Van Gool, L.: Fast Optical flow using dense inverse search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 471–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_29

    Chapter  Google Scholar 

  18. Kroening, D., Tautschnig, M.: CBMC-C bounded model checker. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems (2014)

    Google Scholar 

  19. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  20. Li, J., He, P., Zhu, J., Lyu, M.R.: Software defect prediction via convolutional neural network. In: IEEE International Conference on Software Quality, Reliability and Security (2017)

    Google Scholar 

  21. Li, Z., Zou, D., Xu, S., Jin, H., Qi, H., Hu, J.: VulPecker: an automated vulnerability detection system based on code similarity analysis. In: Annual Conference on Computer Security Applications (2016)

    Google Scholar 

  22. Li, Z., et al.: VulDeePecker: a deep learning-based system for vulnerability detection. In: Annual Network and Distributed System Security Symposium (2018)

    Google Scholar 

  23. Lin, G., Zhang, J., Luo, W., Pan, L., Xiang, Y.: POSTER: vulnerability discovery with function representation learning from unlabeled projects. In: ACM SIGSAC Conference on Computer and Communications Security (2017)

    Google Scholar 

  24. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: International Joint Conference on Artificial Intelligence (2016)

    Google Scholar 

  25. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2015)

    Google Scholar 

  26. Machiry, A., Spensky, C., Corina, J., Stephens, N., Kruegel, C., Vigna, G.: Dr.Checker: a soundy analysis for Linux Kernel drivers. In: USENIX Security Symposium USENIX Security (2017)

    Google Scholar 

  27. Microsoft: API reference docs for Windows Driver Kit (WDK). https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/

  28. Microsoft: GitHub. https://github.com/

  29. Microsoft: Windows API sets. https://docs.microsoft.com/en-us/windows/win32/apiindex/windows-apisets

  30. MITRE: Common Weakness Enumeration. https://cwe.mitre.org/data/index.html

  31. Neuhaus, S., Zimmermann, T., Holler, C., Zeller, A.: Predicting vulnerable software components. In: ACM Conference on Computer and Communications Security (2007)

    Google Scholar 

  32. Neumann, M., King, D., Beltagy, I., Ammar, W.: Scispacy: fast and robust models for biomedical natural language processing. arXiv preprint arXiv:1902.07669 (2019)

  33. Qiu, S., Chang, G.H., Panagia, M., Gopal, D.M., Au, R., Kolachalama, V.B.: Fusion of deep learning models of MRI scans, mini-mental state examination, and logical memory test enhances diagnosis of mild cognitive impairment. Diag. Assess. Prog. 10, 737–749 (2018)

    Google Scholar 

  34. Qiu, Z., Yao, T., Mei, T.: Learning deep spatio-temporal dependence for semantic video segmentation. IEEE Trans. Multim. 20, 939–949 (2018)

    Google Scholar 

  35. Russell, R.L., et al.: Automated vulnerability detection in source code using deep representation learning. In: IEEE International Conference on Machine Learning and Applications (2018)

    Google Scholar 

  36. Segaran, T., Hammerbacher, J.: Beautiful Data: The Stories Behind Elegant Data Solutions. O’Reilly Media, Inc. Beijing (2009)

    Google Scholar 

  37. Shar, L.K., Tan, H.B.K., Briand, L.C.: Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis. In: International Conference on Software Engineering (2013)

    Google Scholar 

  38. Shi, Q., Xiao, X., Wu, R., Zhou, J., Fan, G., Zhang, C.: Pinpoint: fast and precise sparse value flow analysis for million lines of code. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2018)

    Google Scholar 

  39. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  40. Tencent: TscanCode. https://github.com/Tencent/TscanCode

  41. Tutorial, C.: Finding Declarations. https://xinhuang.github.io/posts/2014-10-19-clang-tutorial-finding-declarations.html

  42. Vaswani, A., et al.: Attention is all you need. In: Conference on Neural Information Processing Systems. In: 36th Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS 2017) (2017)

    Google Scholar 

  43. Wang, J., et al.: NLP-EYE: detecting memory corruptions via semantic-aware memory operation function identification. In: International Symposium on Research in Attacks, Intrusions and Defenses (2019)

    Google Scholar 

  44. Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect prediction. In: International Conference on Software Engineering (2016)

    Google Scholar 

  45. Wei, X., Wolf, M.: A survey on HTTPS implementation by Android Apps: Issues and countermeasures. Appl. Comput. Inform. 13, 101–117 (2017)

    Google Scholar 

  46. Wheeler, D.A.: Flawfinder. https://dwheeler.com/flawfinder/

  47. Xing, H.: Chinese-Text-Classification-Pytorch. https://github.com/649453932/Chinese-Text-Classification-Pytorch (2020)

  48. Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Annual Computer Security Applications Conference (2012)

    Google Scholar 

  49. Yamaguchi, F., Wressnegger, C., Gascon, H., Rieck, K.: Chucky: exposing missing checks in source code for vulnerability discovery. In: ACM SIGSAC Conference on Computer and Communications Security (2013)

    Google Scholar 

  50. Yan, H., Sui, Y., Chen, S., Xue, J.: Spatio-temporal context reduction: a pointer-analysis-based static approach for detecting use-after-free vulnerabilities. In: IEEE/ACM International Conference on Software Engineering (2018)

    Google Scholar 

  51. Yan, X., et al.: Video scene parsing: An overview of deep learning methods and datasets. Comput. Vis. Image Underst. 201, 103077(2020)

    Google Scholar 

  52. Yang, X., Lo, D., Xia, X., Zhang, Y., Sun, J.: Deep learning for just-in-time defect prediction. In: IEEE International Conference on Software Quality, Reliability and Security (2015)

    Google Scholar 

  53. Yunlongs: Clang-function-prototype. https://github.com/Yunlongs/clang-function-prototype

  54. Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

  55. Zhou, P., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Annual Meeting of the Association for Computational Linguistics (2016)

    Google Scholar 

  56. Zou, D., Wang, S., Xu, S., Li, Z., Jin, H.: \(\mu \)VulDeePecker: a deep learning-based system for multiclass vulnerability detection. IEEE Trans. Depend. Sec. Comput. 18 (2019)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by the Australian Research Council under Project DP210101859 and the University of Sydney Research Accelerator (SOAR) Prize.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huihui Gong .

Editor information

Editors and Affiliations

Appendices

Appendices

A Additional Vulnerability Cases

An example with a data processing error is shown in Listing 1.3 (in the file ddr_training_impl.c). In the if condition of Line 5, if the type of PHY_DRAMCFG_TYPE_LPDDR4 is different from that of cfg–>phy[0].dram_type, the CWE–1024 vulnerability (comparison of incompatible types, e.g., comparison of string data and int data) will be incurred. Furthermore, if the struct cfg is NULL, the codes within the if condition will not be executed, which may cause other unexpected vulnerabilities.

figure g

For random number issues, Listing 1.4 (in the file rand.c) presents an example with the CWE–1241 vulnerability, which is about using a predictable algorithm in random number generation. The random number function rand (Lines 12–15) calls the function rand_r (Lines 3–10), which uses a constant value 1U (Line 1) as the random number seed and an invariable algorithm (Lines 5–7) to generate random numbers, which is predictable/non-random and vulnerable.

figure h

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gong, H., Ma, S., Camtepe, S., Nepal, S., Xu, C. (2022). Vulnerability Detection Using Deep Learning Based Function Classification. In: Yuan, X., Bai, G., Alcaraz, C., Majumdar, S. (eds) Network and System Security. NSS 2022. Lecture Notes in Computer Science, vol 13787. Springer, Cham. https://doi.org/10.1007/978-3-031-23020-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23020-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23019-6

  • Online ISBN: 978-3-031-23020-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics