Vulnerability Detection Using Deep Learning Based Function Classification

Gong, Huihui; Ma, Siqi; Camtepe, Seyit; Nepal, Surya; Xu, Chang

doi:10.1007/978-3-031-23020-2_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13787))

Included in the following conference series:

International Conference on Network and System Security

1150 Accesses

Abstract

Software vulnerabilities are becoming increasingly severe problems, which can pose great risks of information leakage, denial of service, or even system crashes. However, their detection is still formidable, due to the diverse forms of software development and the diverse programming styles of software developers. In this paper, we propose a vulnerability detection tool to explore security issues in source code using deep learning-based function classification. Specifically, we first extract and parse function prototypes in the source code. Then, with the help of the pre-built corpus, we split the function prototypes into segmentations with semantic meanings. By utilizing deep learning-based classifiers, the segmentations are further classified into seven categories. Finally, we use static scanning analyzers to separately detect vulnerabilities of different types of functions. Additionally, the experimental results show that the proposed method can effectively and efficiently distinguish vulnerabilities in the benchmark source code (5 of 7 memory corruptions, 13 of 18 cryptography vulnerabilities, 5 of 6 data processing errors, and 13 of 18 random number issues).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apple: Clang Static Analyzer. https://clang-analyzer.llvm.org/
Atwood, J., Spolsky, J.: Stack overflow. https://stackoverflow.com/
Corporation, C.P.B.: The Linux Kernel Archives. https://www.kernel.org/
Gens, D., Schmitt, S., Davi, L., Sadeghi, A.R.: K-Miner: Uncovering memory corruption in Linux. In: Network and Distributed System Security Symposium (2018)
Google Scholar
Google: Google Web Trillion Word Corpus. https://ai.googleblog.com/2006/08/all-our-n-gram-are-belong-to-you.html
Grieco, G., Grinblat, G.L., Uzal, L.C., Rawat, S., Feist, J., Mounier, L.: Toward large-scale vulnerability discovery using machine learning. In: ACM Conference on Data and Application Security and Privacy (2016)
Google Scholar
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. arXiv preprint arXiv:2007.15779 (2020)
Gu, Z., Wu, J., Li, C., Zhou, M., Gu, M.: SSLDoc: automatically diagnosing incorrect SSL API Usages in C Programs. In: International Conference on Software Engineering and Knowledge Engineering (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Huo, X., Li, M., Zhou, Z.: Learning unified features from natural and programming languages for locating buggy source code. In: International Joint Conference on Artificial Intelligence (2016)
Google Scholar
Jenks, G.: Python word segmentation. https://pypi.org/project/wordsegment/
Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Annual Meeting of the Association for Computational Linguistics (2017)
Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Conference of the European Chapter of the Association for Computational Linguistics (2017)
Google Scholar
Kim, S., Woo, S., Lee, H., Oh, H.: VUDDY: a scalable approach for vulnerable code clone discovery. In: IEEE Symposium on Security and Privacy (2017)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Conference on Empirical Methods in Natural Language Processing (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)
Google Scholar
Kroeger, T., Timofte, R., Dai, D., Van Gool, L.: Fast Optical flow using dense inverse search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 471–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_29
Chapter Google Scholar
Kroening, D., Tautschnig, M.: CBMC-C bounded model checker. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems (2014)
Google Scholar
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Li, J., He, P., Zhu, J., Lyu, M.R.: Software defect prediction via convolutional neural network. In: IEEE International Conference on Software Quality, Reliability and Security (2017)
Google Scholar
Li, Z., Zou, D., Xu, S., Jin, H., Qi, H., Hu, J.: VulPecker: an automated vulnerability detection system based on code similarity analysis. In: Annual Conference on Computer Security Applications (2016)
Google Scholar
Li, Z., et al.: VulDeePecker: a deep learning-based system for vulnerability detection. In: Annual Network and Distributed System Security Symposium (2018)
Google Scholar
Lin, G., Zhang, J., Luo, W., Pan, L., Xiang, Y.: POSTER: vulnerability discovery with function representation learning from unlabeled projects. In: ACM SIGSAC Conference on Computer and Communications Security (2017)
Google Scholar
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: International Joint Conference on Artificial Intelligence (2016)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Machiry, A., Spensky, C., Corina, J., Stephens, N., Kruegel, C., Vigna, G.: Dr.Checker: a soundy analysis for Linux Kernel drivers. In: USENIX Security Symposium USENIX Security (2017)
Google Scholar
Microsoft: API reference docs for Windows Driver Kit (WDK). https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/
Microsoft: GitHub. https://github.com/
Microsoft: Windows API sets. https://docs.microsoft.com/en-us/windows/win32/apiindex/windows-apisets
MITRE: Common Weakness Enumeration. https://cwe.mitre.org/data/index.html
Neuhaus, S., Zimmermann, T., Holler, C., Zeller, A.: Predicting vulnerable software components. In: ACM Conference on Computer and Communications Security (2007)
Google Scholar
Neumann, M., King, D., Beltagy, I., Ammar, W.: Scispacy: fast and robust models for biomedical natural language processing. arXiv preprint arXiv:1902.07669 (2019)
Qiu, S., Chang, G.H., Panagia, M., Gopal, D.M., Au, R., Kolachalama, V.B.: Fusion of deep learning models of MRI scans, mini-mental state examination, and logical memory test enhances diagnosis of mild cognitive impairment. Diag. Assess. Prog. 10, 737–749 (2018)
Google Scholar
Qiu, Z., Yao, T., Mei, T.: Learning deep spatio-temporal dependence for semantic video segmentation. IEEE Trans. Multim. 20, 939–949 (2018)
Google Scholar
Russell, R.L., et al.: Automated vulnerability detection in source code using deep representation learning. In: IEEE International Conference on Machine Learning and Applications (2018)
Google Scholar
Segaran, T., Hammerbacher, J.: Beautiful Data: The Stories Behind Elegant Data Solutions. O’Reilly Media, Inc. Beijing (2009)
Google Scholar
Shar, L.K., Tan, H.B.K., Briand, L.C.: Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis. In: International Conference on Software Engineering (2013)
Google Scholar
Shi, Q., Xiao, X., Wu, R., Zhou, J., Fan, G., Zhang, C.: Pinpoint: fast and precise sparse value flow analysis for million lines of code. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tencent: TscanCode. https://github.com/Tencent/TscanCode
Tutorial, C.: Finding Declarations. https://xinhuang.github.io/posts/2014-10-19-clang-tutorial-finding-declarations.html
Vaswani, A., et al.: Attention is all you need. In: Conference on Neural Information Processing Systems. In: 36th Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS 2017) (2017)
Google Scholar
Wang, J., et al.: NLP-EYE: detecting memory corruptions via semantic-aware memory operation function identification. In: International Symposium on Research in Attacks, Intrusions and Defenses (2019)
Google Scholar
Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect prediction. In: International Conference on Software Engineering (2016)
Google Scholar
Wei, X., Wolf, M.: A survey on HTTPS implementation by Android Apps: Issues and countermeasures. Appl. Comput. Inform. 13, 101–117 (2017)
Google Scholar
Wheeler, D.A.: Flawfinder. https://dwheeler.com/flawfinder/
Xing, H.: Chinese-Text-Classification-Pytorch. https://github.com/649453932/Chinese-Text-Classification-Pytorch (2020)
Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Annual Computer Security Applications Conference (2012)
Google Scholar
Yamaguchi, F., Wressnegger, C., Gascon, H., Rieck, K.: Chucky: exposing missing checks in source code for vulnerability discovery. In: ACM SIGSAC Conference on Computer and Communications Security (2013)
Google Scholar
Yan, H., Sui, Y., Chen, S., Xue, J.: Spatio-temporal context reduction: a pointer-analysis-based static approach for detecting use-after-free vulnerabilities. In: IEEE/ACM International Conference on Software Engineering (2018)
Google Scholar
Yan, X., et al.: Video scene parsing: An overview of deep learning methods and datasets. Comput. Vis. Image Underst. 201, 103077(2020)
Google Scholar
Yang, X., Lo, D., Xia, X., Zhang, Y., Sun, J.: Deep learning for just-in-time defect prediction. In: IEEE International Conference on Software Quality, Reliability and Security (2015)
Google Scholar
Yunlongs: Clang-function-prototype. https://github.com/Yunlongs/clang-function-prototype
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Zhou, P., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Annual Meeting of the Association for Computational Linguistics (2016)
Google Scholar
Zou, D., Wang, S., Xu, S., Li, Z., Jin, H.: \(\mu \)VulDeePecker: a deep learning-based system for multiclass vulnerability detection. IEEE Trans. Depend. Sec. Comput. 18 (2019)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the Australian Research Council under Project DP210101859 and the University of Sydney Research Accelerator (SOAR) Prize.

Author information

Authors and Affiliations

The University of Sydney, Sydney, NSW, 2008, Australia
Huihui Gong & Chang Xu
Data61, CSIRO, Sydney, NSW, 1466, Australia
Huihui Gong, Seyit Camtepe & Surya Nepal
University of New South Wales Canberra, Canberra, ACT, 2612, Australia
Siqi Ma

Authors

Huihui Gong
View author publications
You can also search for this author in PubMed Google Scholar
Siqi Ma
View author publications
You can also search for this author in PubMed Google Scholar
Seyit Camtepe
View author publications
You can also search for this author in PubMed Google Scholar
Surya Nepal
View author publications
You can also search for this author in PubMed Google Scholar
Chang Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huihui Gong .

Editor information

Editors and Affiliations

Monash University, Clayton, VIC, Australia
Xingliang Yuan
The University of Queensland, Queensland, QLD, Australia
Guangdong Bai
University of Malaga, Málaga, Spain
Cristina Alcaraz
Concordia University, Montreal, QC, Canada
Suryadipta Majumdar

Appendices

A Additional Vulnerability Cases

An example with a data processing error is shown in Listing 1.3 (in the file ddr_training_impl.c). In the if condition of Line 5, if the type of PHY_DRAMCFG_TYPE_LPDDR4 is different from that of cfg–>phy[0].dram_type, the CWE–1024 vulnerability (comparison of incompatible types, e.g., comparison of string data and int data) will be incurred. Furthermore, if the struct cfg is NULL, the codes within the if condition will not be executed, which may cause other unexpected vulnerabilities.

For random number issues, Listing 1.4 (in the file rand.c) presents an example with the CWE–1241 vulnerability, which is about using a predictable algorithm in random number generation. The random number function rand (Lines 12–15) calls the function rand_r (Lines 3–10), which uses a constant value 1U (Line 1) as the random number seed and an invariable algorithm (Lines 5–7) to generate random numbers, which is predictable/non-random and vulnerable.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gong, H., Ma, S., Camtepe, S., Nepal, S., Xu, C. (2022). Vulnerability Detection Using Deep Learning Based Function Classification. In: Yuan, X., Bai, G., Alcaraz, C., Majumdar, S. (eds) Network and System Security. NSS 2022. Lecture Notes in Computer Science, vol 13787. Springer, Cham. https://doi.org/10.1007/978-3-031-23020-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-23020-2_1
Published: 07 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23019-6
Online ISBN: 978-3-031-23020-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Vulnerability Detection Using Deep Learning Based Function Classification

Abstract

Access this chapter

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendices

A Additional Vulnerability Cases

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation