A Review of Data Representation Methods for Vulnerability Mining Using Deep Learning

Li, Ying; Gu, Mianxue; Sun, Hongyu; Lin, Yuhao; Yue, Qiuling; Guo, Zhen; Hu, Jinglu; Wang, He; Zhang, Yuqing

doi:10.1007/978-981-19-0523-0_22

Ying Li^9,10,
Mianxue Gu^9,10,
Hongyu Sun^10,11,
Yuhao Lin^9,10,
Qiuling Yue⁹,
Zhen Guo⁹,
Jinglu Hu¹²,
He Wang^10,11 &
…
Yuqing Zhang^9,10,11

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1558))

Included in the following conference series:

International Conference on Frontiers in Cyber Security

768 Accesses

Abstract

The rapid development of software has brought unprecedented severe challenges to software security vulnerabilities. Traditional vulnerability mining methods are difficult to apply to large-scale software systems due to drawbacks such as manual inspection, low efficiency, high false positives and high false negatives. Recent research works have attempted to apply deep learning models to vulnerability mining, and have made a good progress in vulnerability mining filed. In this paper, we analyze the deep learning model framework applied to vulnerability mining and summarize its overall workflow and technology. Then, we give a detailed analysis on five feature extraction methods for vulnerability mining, including sequence characterization-based method, abstract syntax tree-based method, graph-based method, text-based method and mixed characterization-based method. In addition, we summarize their advantages and disadvantages from the angles of single and mixed feature extraction method. Finally, we point out the future research trends and prospects.

This work was supported by the Key Research and Development Science and Technology of Hainan Province(ZDYF202012), the National Key Research and Development Program of China(2018YFB0804701), and the National Natural Science Foundation of China (U1836210).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Program Source Code Vulnerability Mining Scheme Based on Abstract Syntax Tree

The application of neural network for software vulnerability detection: a review

Article 27 November 2022

Vulnerability Detection with Representation Learning

References

Zhao, H., Li, X., Tan, J., Gai, K.: Smart contract security issues and research status. Inf. Technol. Netw. Secur. 40(05), 1–6 (2021)
Google Scholar
Gu, M., et al.: Software secure vulnerability mining based on deep learning. Comput. Res. Dev. 58(10), 2073–2095 (2021)
Google Scholar
Li, Y., Huang, C., Wang, Z., Yuan, L., Wang, X.: Overview of software vulnerability mining methods based on machine learning. J. Softw. 31(07), 2040–2061 (2020)
Google Scholar
Tao, Y., Jia, X., Wu, Y.: A research method of industrial Internet security vulnerabilities based on knowledge map. Inf. Technol. Netw. Secur. 39(01), 6–13 (2020)
Google Scholar
Peng, H., Mou, L., Li, G., et al.: Building program vector representations for deep learning. In: 8th International Conference on Knowledge Science, Engineering and Management, pp. 547–553 (2015)
Google Scholar
He, Y., Li, B.: Learning rate strategy of a combined deep learning model. J. Autom. 42(06), 953–958 (2016)
Google Scholar
Allamanis, M., Brockschmidt, M., Khademi, M.: Learning to represent programs with graphs. In: International Conference on Learning Presentations, pp. 1–17 (2017)
Google Scholar
Wang, L., Li, X., Wang, R., et al.: PreNNsem: A heterogeneous ensemble learning framework for vulnerability detection in software. Appl. Sci. 10(22), 7954 (2020)
Article Google Scholar
Zhang, J., Wang, X., Zhang, H., et al.: A novel neural source code representation based on abstract syntax tree. In: 41st International Conference on Software Engineering, pp. 783–794 (2019)
Google Scholar
Wang, H., Li, Han., Li, H.: Research on ontology relation extraction method in the field of civil aviation emergencies. Comput. Sci. Explor. 04(02), 285–293 (2020)
Google Scholar
Li, X., Wang, L., Xin, Y., et al.: Automated software vulnerability detection based on hybrid neural network. Appl. Sci. 11(07), 3201 (2021)
Article Google Scholar
Yang, H., Shen, S., Xiong, J., et al.: Modulation recognition of underwater acoustic communication signals based on denoting and deep sparse autoencoder. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, pp. 5506–5511 (2016)
Google Scholar
Wang, X.: Application of hierarchical clustering based on matrix transformation in gene expression data analysis. Comput. CD Softw. Appl. 15(24), 46–47 (2012)
Google Scholar
Zhu, X.: Deep learning analysis based on data collection. Jun. Mid. Sch. World: Jun. Mid. Sch. Teach. Res. 04, 66 (2021)
Google Scholar
Liu, M., Wang, X., Huang, Y.: Data preprocessing in data mining. Comput. Sci. 04, 56–59 (2000)
Google Scholar
Mohamed, A., Sainath, T., Dahl, G., et al.: Deep belief network for telephone recognition u sing discriminant features. In: IEEE International Conference on acoustics, pp. 5060–5063 (2015)
Google Scholar
Wu, F., Wang, J., Liu, J., et al.: Vulnerability detection with deep learning. In: 3rd IEEE International Conference on Computer and Communications, pp. 1298–1302 (2017)
Google Scholar
Yu, X., Chen, W., Chen, R.: Implementation of an approximate mining method for data protocol. J. Huaqiao Univ. (NATURAL SCIENCE EDITION) 29(03)29, 370–374 (2008)
Google Scholar
Jaafor, O., Birregah, B.: Multi-layered graph-based model for social engineering vulnerability assessment. In: International Conference on Advances in Social Networks Analysis and Mining, pp. 1480–1488 (2015)
Google Scholar
Gao, R., Zhou, C., Zhu, R.: Research on vulnerability mining technology of network application program. Mod. Electron. Tech. 41(03), 15–19 (2018)
Google Scholar
Lin, Z., Xiang, L., Kuang, X.: Machine Learning in Vulnerability Databases. In: 10th International Symposium on Computational Intelligence and Design (ISCID), pp. 108–113 (2018)
Google Scholar
Pang, Y., Xue, X., Wang, H.: Predicting vulnerable software components through deep neural network. In: 12th International Conference on Advanced Computational Intelligence (ICACI), pp. 6–10 (2017)
Google Scholar
Zou, Q., et al.: From automation to intelligence: progress in software vulnerability mining technology. J. Tsinghua Univ. (NATURAL SCIENCE EDITION) 58(12), 1079–1094 (2018)
Google Scholar
Li, Z., Zou, D., Xu, S., et al. VulDeePecker: A deep learning-based system for vulnerability detection. In: 25th Annual Network and Distributed System Security Symposium(NDSS), pp. 1–15 (2018)
Google Scholar
Jian, X., Gu, H., Wang, R.: A short-term photovoltaic power prediction model based on dual-channel CNN and LSTM. Electr. Power Sci. Eng. 35(5), 7–11 (2019)
Google Scholar
Zhang, Q., Peng, Z.: Attention-based convolutionalgated recurrent neural network for reader’s emotion prediction. Comput. Eng. Appl. 54(13), 168–174 (2018)
Google Scholar
Liu, Q., Hu, Q., Yang, L., Zhou, H.: Research on deep learning photovoltaic power generation model based on time series. Power Syst. Protect. Control 49(19), 87–98 (2021)
Google Scholar
Jiang, L., Liu, J., Zhang, H.: Discrimination and compensation of abnormal values of magnetic flux leakage in oil pipeline based on BP neural network. In: Chinese Control and Decision Conference (CCDC), pp. 3714–3718 (2017)
Google Scholar
Pisal, A., Sor, R., Kinage, K., Facial feature extraction using hierarchical MAX(HMAX) method. In: International Conference on Computing, Communication, Control and Automation (ICCUBEA), pp. 1–5 (2017)
Google Scholar
Sedaghat, A., Ebadi, H.: Remote sensing image matching based on adaptive binning SIFT descriptor. IEEE Trans. Geosci. Remote Sens. 53(10), 5283–5293 (2015)
Article Google Scholar
Liu, X., Tang, J.: Mass classification in mammograms using selected geometry and texture features. New SVM-Bas. Feature Select. Meth. 8(3), 910–920 (2014)
Google Scholar
Jiang, L., Liu, J., Zhang, H., Xu, K.: MFL data feature extraction based on KPCA-BOMW Model. In: 31st Chinese Control and Decision Conference (CCDC), pp. 1025–1029 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Cyberspace Security (School of Cryptography), Hainan University, Haikou Hainan, 570100, China
Ying Li, Mianxue Gu, Yuhao Lin, Qiuling Yue, Zhen Guo & Yuqing Zhang
National Computer Network Intrusion Protection Center, University of Academy of Sciences, Beijing, 101408, China
Ying Li, Mianxue Gu, Hongyu Sun, Yuhao Lin, He Wang & Yuqing Zhang
School of Cyber Engineering, Xidian University, Xi’an, 710126, Shaanxi, China
Hongyu Sun, He Wang & Yuqing Zhang
Graduate School of Information, Production and Systems, Waseda University, Shinjuku-ku, Tokyo, 169-8050, Japan
Jinglu Hu

Authors

Ying Li
View author publications
You can also search for this author in PubMed Google Scholar
Mianxue Gu
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yuhao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Qiuling Yue
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jinglu Hu
View author publications
You can also search for this author in PubMed Google Scholar
He Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuqing Zhang .

Editor information

Editors and Affiliations

Hainan University, Haikou, China
Chunjie Cao
National Computer Network Intrusion Protection Center (NCNIPC), Beijing, China
Yuqing Zhang
Illinois Institute of Technology, Chicago, IL, USA
Yuan Hong
Nankai University, Tianjin, China
Ding Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y. et al. (2022). A Review of Data Representation Methods for Vulnerability Mining Using Deep Learning. In: Cao, C., Zhang, Y., Hong, Y., Wang, D. (eds) Frontiers in Cyber Security. FCS 2021. Communications in Computer and Information Science, vol 1558. Springer, Singapore. https://doi.org/10.1007/978-981-19-0523-0_22

Download citation

DOI: https://doi.org/10.1007/978-981-19-0523-0_22
Published: 01 March 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0522-3
Online ISBN: 978-981-19-0523-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics