Abstract
Software vulnerabilities are becoming increasingly severe problems, which can pose great risks of information leakage, denial of service, or even system crashes. However, their detection is still formidable, due to the diverse forms of software development and the diverse programming styles of software developers. In this paper, we propose a vulnerability detection tool to explore security issues in source code using deep learning-based function classification. Specifically, we first extract and parse function prototypes in the source code. Then, with the help of the pre-built corpus, we split the function prototypes into segmentations with semantic meanings. By utilizing deep learning-based classifiers, the segmentations are further classified into seven categories. Finally, we use static scanning analyzers to separately detect vulnerabilities of different types of functions. Additionally, the experimental results show that the proposed method can effectively and efficiently distinguish vulnerabilities in the benchmark source code (5 of 7 memory corruptions, 13 of 18 cryptography vulnerabilities, 5 of 6 data processing errors, and 13 of 18 random number issues).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apple: Clang Static Analyzer. https://clang-analyzer.llvm.org/
Atwood, J., Spolsky, J.: Stack overflow. https://stackoverflow.com/
Corporation, C.P.B.: The Linux Kernel Archives. https://www.kernel.org/
Gens, D., Schmitt, S., Davi, L., Sadeghi, A.R.: K-Miner: Uncovering memory corruption in Linux. In: Network and Distributed System Security Symposium (2018)
Google: Google Web Trillion Word Corpus. https://ai.googleblog.com/2006/08/all-our-n-gram-are-belong-to-you.html
Grieco, G., Grinblat, G.L., Uzal, L.C., Rawat, S., Feist, J., Mounier, L.: Toward large-scale vulnerability discovery using machine learning. In: ACM Conference on Data and Application Security and Privacy (2016)
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. arXiv preprint arXiv:2007.15779 (2020)
Gu, Z., Wu, J., Li, C., Zhou, M., Gu, M.: SSLDoc: automatically diagnosing incorrect SSL API Usages in C Programs. In: International Conference on Software Engineering and Knowledge Engineering (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Huo, X., Li, M., Zhou, Z.: Learning unified features from natural and programming languages for locating buggy source code. In: International Joint Conference on Artificial Intelligence (2016)
Jenks, G.: Python word segmentation. https://pypi.org/project/wordsegment/
Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Annual Meeting of the Association for Computational Linguistics (2017)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Conference of the European Chapter of the Association for Computational Linguistics (2017)
Kim, S., Woo, S., Lee, H., Oh, H.: VUDDY: a scalable approach for vulnerable code clone discovery. In: IEEE Symposium on Security and Privacy (2017)
Kim, Y.: Convolutional neural networks for sentence classification. In: Conference on Empirical Methods in Natural Language Processing (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)
Kroeger, T., Timofte, R., Dai, D., Van Gool, L.: Fast Optical flow using dense inverse search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 471–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_29
Kroening, D., Tautschnig, M.: CBMC-C bounded model checker. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems (2014)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI Conference on Artificial Intelligence (2015)
Li, J., He, P., Zhu, J., Lyu, M.R.: Software defect prediction via convolutional neural network. In: IEEE International Conference on Software Quality, Reliability and Security (2017)
Li, Z., Zou, D., Xu, S., Jin, H., Qi, H., Hu, J.: VulPecker: an automated vulnerability detection system based on code similarity analysis. In: Annual Conference on Computer Security Applications (2016)
Li, Z., et al.: VulDeePecker: a deep learning-based system for vulnerability detection. In: Annual Network and Distributed System Security Symposium (2018)
Lin, G., Zhang, J., Luo, W., Pan, L., Xiang, Y.: POSTER: vulnerability discovery with function representation learning from unlabeled projects. In: ACM SIGSAC Conference on Computer and Communications Security (2017)
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: International Joint Conference on Artificial Intelligence (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2015)
Machiry, A., Spensky, C., Corina, J., Stephens, N., Kruegel, C., Vigna, G.: Dr.Checker: a soundy analysis for Linux Kernel drivers. In: USENIX Security Symposium USENIX Security (2017)
Microsoft: API reference docs for Windows Driver Kit (WDK). https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/
Microsoft: GitHub. https://github.com/
Microsoft: Windows API sets. https://docs.microsoft.com/en-us/windows/win32/apiindex/windows-apisets
MITRE: Common Weakness Enumeration. https://cwe.mitre.org/data/index.html
Neuhaus, S., Zimmermann, T., Holler, C., Zeller, A.: Predicting vulnerable software components. In: ACM Conference on Computer and Communications Security (2007)
Neumann, M., King, D., Beltagy, I., Ammar, W.: Scispacy: fast and robust models for biomedical natural language processing. arXiv preprint arXiv:1902.07669 (2019)
Qiu, S., Chang, G.H., Panagia, M., Gopal, D.M., Au, R., Kolachalama, V.B.: Fusion of deep learning models of MRI scans, mini-mental state examination, and logical memory test enhances diagnosis of mild cognitive impairment. Diag. Assess. Prog. 10, 737–749 (2018)
Qiu, Z., Yao, T., Mei, T.: Learning deep spatio-temporal dependence for semantic video segmentation. IEEE Trans. Multim. 20, 939–949 (2018)
Russell, R.L., et al.: Automated vulnerability detection in source code using deep representation learning. In: IEEE International Conference on Machine Learning and Applications (2018)
Segaran, T., Hammerbacher, J.: Beautiful Data: The Stories Behind Elegant Data Solutions. O’Reilly Media, Inc. Beijing (2009)
Shar, L.K., Tan, H.B.K., Briand, L.C.: Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis. In: International Conference on Software Engineering (2013)
Shi, Q., Xiao, X., Wu, R., Zhou, J., Fan, G., Zhang, C.: Pinpoint: fast and precise sparse value flow analysis for million lines of code. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tencent: TscanCode. https://github.com/Tencent/TscanCode
Tutorial, C.: Finding Declarations. https://xinhuang.github.io/posts/2014-10-19-clang-tutorial-finding-declarations.html
Vaswani, A., et al.: Attention is all you need. In: Conference on Neural Information Processing Systems. In: 36th Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS 2017) (2017)
Wang, J., et al.: NLP-EYE: detecting memory corruptions via semantic-aware memory operation function identification. In: International Symposium on Research in Attacks, Intrusions and Defenses (2019)
Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect prediction. In: International Conference on Software Engineering (2016)
Wei, X., Wolf, M.: A survey on HTTPS implementation by Android Apps: Issues and countermeasures. Appl. Comput. Inform. 13, 101–117 (2017)
Wheeler, D.A.: Flawfinder. https://dwheeler.com/flawfinder/
Xing, H.: Chinese-Text-Classification-Pytorch. https://github.com/649453932/Chinese-Text-Classification-Pytorch (2020)
Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Annual Computer Security Applications Conference (2012)
Yamaguchi, F., Wressnegger, C., Gascon, H., Rieck, K.: Chucky: exposing missing checks in source code for vulnerability discovery. In: ACM SIGSAC Conference on Computer and Communications Security (2013)
Yan, H., Sui, Y., Chen, S., Xue, J.: Spatio-temporal context reduction: a pointer-analysis-based static approach for detecting use-after-free vulnerabilities. In: IEEE/ACM International Conference on Software Engineering (2018)
Yan, X., et al.: Video scene parsing: An overview of deep learning methods and datasets. Comput. Vis. Image Underst. 201, 103077(2020)
Yang, X., Lo, D., Xia, X., Zhang, Y., Sun, J.: Deep learning for just-in-time defect prediction. In: IEEE International Conference on Software Quality, Reliability and Security (2015)
Yunlongs: Clang-function-prototype. https://github.com/Yunlongs/clang-function-prototype
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Zhou, P., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Annual Meeting of the Association for Computational Linguistics (2016)
Zou, D., Wang, S., Xu, S., Li, Z., Jin, H.: \(\mu \)VulDeePecker: a deep learning-based system for multiclass vulnerability detection. IEEE Trans. Depend. Sec. Comput. 18 (2019)
Acknowledgement
This work was supported in part by the Australian Research Council under Project DP210101859 and the University of Sydney Research Accelerator (SOAR) Prize.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendices
A Additional Vulnerability Cases
An example with a data processing error is shown in Listing 1.3 (in the file ddr_training_impl.c). In the if condition of Line 5, if the type of PHY_DRAMCFG_TYPE_LPDDR4 is different from that of cfg–>phy[0].dram_type, the CWE–1024 vulnerability (comparison of incompatible types, e.g., comparison of string data and int data) will be incurred. Furthermore, if the struct cfg is NULL, the codes within the if condition will not be executed, which may cause other unexpected vulnerabilities.
For random number issues, Listing 1.4 (in the file rand.c) presents an example with the CWE–1241 vulnerability, which is about using a predictable algorithm in random number generation. The random number function rand (Lines 12–15) calls the function rand_r (Lines 3–10), which uses a constant value 1U (Line 1) as the random number seed and an invariable algorithm (Lines 5–7) to generate random numbers, which is predictable/non-random and vulnerable.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gong, H., Ma, S., Camtepe, S., Nepal, S., Xu, C. (2022). Vulnerability Detection Using Deep Learning Based Function Classification. In: Yuan, X., Bai, G., Alcaraz, C., Majumdar, S. (eds) Network and System Security. NSS 2022. Lecture Notes in Computer Science, vol 13787. Springer, Cham. https://doi.org/10.1007/978-3-031-23020-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-23020-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23019-6
Online ISBN: 978-3-031-23020-2
eBook Packages: Computer ScienceComputer Science (R0)