Abstract
The rapid development of software has brought unprecedented severe challenges to software security vulnerabilities. Traditional vulnerability mining methods are difficult to apply to large-scale software systems due to drawbacks such as manual inspection, low efficiency, high false positives and high false negatives. Recent research works have attempted to apply deep learning models to vulnerability mining, and have made a good progress in vulnerability mining filed. In this paper, we analyze the deep learning model framework applied to vulnerability mining and summarize its overall workflow and technology. Then, we give a detailed analysis on five feature extraction methods for vulnerability mining, including sequence characterization-based method, abstract syntax tree-based method, graph-based method, text-based method and mixed characterization-based method. In addition, we summarize their advantages and disadvantages from the angles of single and mixed feature extraction method. Finally, we point out the future research trends and prospects.
This work was supported by the Key Research and Development Science and Technology of Hainan Province(ZDYF202012), the National Key Research and Development Program of China(2018YFB0804701), and the National Natural Science Foundation of China (U1836210).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhao, H., Li, X., Tan, J., Gai, K.: Smart contract security issues and research status. Inf. Technol. Netw. Secur. 40(05), 1–6 (2021)
Gu, M., et al.: Software secure vulnerability mining based on deep learning. Comput. Res. Dev. 58(10), 2073–2095 (2021)
Li, Y., Huang, C., Wang, Z., Yuan, L., Wang, X.: Overview of software vulnerability mining methods based on machine learning. J. Softw. 31(07), 2040–2061 (2020)
Tao, Y., Jia, X., Wu, Y.: A research method of industrial Internet security vulnerabilities based on knowledge map. Inf. Technol. Netw. Secur. 39(01), 6–13 (2020)
Peng, H., Mou, L., Li, G., et al.: Building program vector representations for deep learning. In: 8th International Conference on Knowledge Science, Engineering and Management, pp. 547–553 (2015)
He, Y., Li, B.: Learning rate strategy of a combined deep learning model. J. Autom. 42(06), 953–958 (2016)
Allamanis, M., Brockschmidt, M., Khademi, M.: Learning to represent programs with graphs. In: International Conference on Learning Presentations, pp. 1–17 (2017)
Wang, L., Li, X., Wang, R., et al.: PreNNsem: A heterogeneous ensemble learning framework for vulnerability detection in software. Appl. Sci. 10(22), 7954 (2020)
Zhang, J., Wang, X., Zhang, H., et al.: A novel neural source code representation based on abstract syntax tree. In: 41st International Conference on Software Engineering, pp. 783–794 (2019)
Wang, H., Li, Han., Li, H.: Research on ontology relation extraction method in the field of civil aviation emergencies. Comput. Sci. Explor. 04(02), 285–293 (2020)
Li, X., Wang, L., Xin, Y., et al.: Automated software vulnerability detection based on hybrid neural network. Appl. Sci. 11(07), 3201 (2021)
Yang, H., Shen, S., Xiong, J., et al.: Modulation recognition of underwater acoustic communication signals based on denoting and deep sparse autoencoder. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, pp. 5506–5511 (2016)
Wang, X.: Application of hierarchical clustering based on matrix transformation in gene expression data analysis. Comput. CD Softw. Appl. 15(24), 46–47 (2012)
Zhu, X.: Deep learning analysis based on data collection. Jun. Mid. Sch. World: Jun. Mid. Sch. Teach. Res. 04, 66 (2021)
Liu, M., Wang, X., Huang, Y.: Data preprocessing in data mining. Comput. Sci. 04, 56–59 (2000)
Mohamed, A., Sainath, T., Dahl, G., et al.: Deep belief network for telephone recognition u sing discriminant features. In: IEEE International Conference on acoustics, pp. 5060–5063 (2015)
Wu, F., Wang, J., Liu, J., et al.: Vulnerability detection with deep learning. In: 3rd IEEE International Conference on Computer and Communications, pp. 1298–1302 (2017)
Yu, X., Chen, W., Chen, R.: Implementation of an approximate mining method for data protocol. J. Huaqiao Univ. (NATURAL SCIENCE EDITION) 29(03)29, 370–374 (2008)
Jaafor, O., Birregah, B.: Multi-layered graph-based model for social engineering vulnerability assessment. In: International Conference on Advances in Social Networks Analysis and Mining, pp. 1480–1488 (2015)
Gao, R., Zhou, C., Zhu, R.: Research on vulnerability mining technology of network application program. Mod. Electron. Tech. 41(03), 15–19 (2018)
Lin, Z., Xiang, L., Kuang, X.: Machine Learning in Vulnerability Databases. In: 10th International Symposium on Computational Intelligence and Design (ISCID), pp. 108–113 (2018)
Pang, Y., Xue, X., Wang, H.: Predicting vulnerable software components through deep neural network. In: 12th International Conference on Advanced Computational Intelligence (ICACI), pp. 6–10 (2017)
Zou, Q., et al.: From automation to intelligence: progress in software vulnerability mining technology. J. Tsinghua Univ. (NATURAL SCIENCE EDITION) 58(12), 1079–1094 (2018)
Li, Z., Zou, D., Xu, S., et al. VulDeePecker: A deep learning-based system for vulnerability detection. In: 25th Annual Network and Distributed System Security Symposium(NDSS), pp. 1–15 (2018)
Jian, X., Gu, H., Wang, R.: A short-term photovoltaic power prediction model based on dual-channel CNN and LSTM. Electr. Power Sci. Eng. 35(5), 7–11 (2019)
Zhang, Q., Peng, Z.: Attention-based convolutionalgated recurrent neural network for reader’s emotion prediction. Comput. Eng. Appl. 54(13), 168–174 (2018)
Liu, Q., Hu, Q., Yang, L., Zhou, H.: Research on deep learning photovoltaic power generation model based on time series. Power Syst. Protect. Control 49(19), 87–98 (2021)
Jiang, L., Liu, J., Zhang, H.: Discrimination and compensation of abnormal values of magnetic flux leakage in oil pipeline based on BP neural network. In: Chinese Control and Decision Conference (CCDC), pp. 3714–3718 (2017)
Pisal, A., Sor, R., Kinage, K., Facial feature extraction using hierarchical MAX(HMAX) method. In: International Conference on Computing, Communication, Control and Automation (ICCUBEA), pp. 1–5 (2017)
Sedaghat, A., Ebadi, H.: Remote sensing image matching based on adaptive binning SIFT descriptor. IEEE Trans. Geosci. Remote Sens. 53(10), 5283–5293 (2015)
Liu, X., Tang, J.: Mass classification in mammograms using selected geometry and texture features. New SVM-Bas. Feature Select. Meth. 8(3), 910–920 (2014)
Jiang, L., Liu, J., Zhang, H., Xu, K.: MFL data feature extraction based on KPCA-BOMW Model. In: 31st Chinese Control and Decision Conference (CCDC), pp. 1025–1029 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, Y. et al. (2022). A Review of Data Representation Methods for Vulnerability Mining Using Deep Learning. In: Cao, C., Zhang, Y., Hong, Y., Wang, D. (eds) Frontiers in Cyber Security. FCS 2021. Communications in Computer and Information Science, vol 1558. Springer, Singapore. https://doi.org/10.1007/978-981-19-0523-0_22
Download citation
DOI: https://doi.org/10.1007/978-981-19-0523-0_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0522-3
Online ISBN: 978-981-19-0523-0
eBook Packages: Computer ScienceComputer Science (R0)