Skip to main content

Vulnerability Detection with Representation Learning

  • Conference paper
  • First Online:
Ubiquitous Security (UbiSec 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1768))

Included in the following conference series:

  • 922 Accesses

Abstract

It is essential to identify potentially vulnerable code in our software systems. Deep neural network techniques have been used for vulnerability detection. However, existing methods usually ignore the feature representation of vulnerable datasets, resulting in unsatisfactory model performance. Such vulnerability detection techniques should achieve high accuracy, relatively high true-positive rate, and low false-negative rate. At the same time, it needs to be able to complete the vulnerability detection of actual projects and does not require additional expert knowledge or tedious configuration. In this article, we propose and implement VDDRL (A Vulnerability Detection Method Based On Deep Representation Learning). This deep representation learning-based vulnerability detection method combines feature extraction and ensemble learning. VDDRL uses the word2vec model to convert the source code into a vector representation. Deep representations of vulnerable code are learned from vulnerable code token sequences using LSTM models and then trained for classification using traditional machine learning algorithms. The training dataset we use is derived from actual projects and contains seven different types of vulnerabilities. Through comparative experiments on datasets, VDDRL achieves an Accuracy of 95.6%–98.7%, a Precision of 91.6%–99.0%, a Recall of 84.7%–99.5%, and an F1 of 88.1%–99.2%. Both perform better than the baseline method. Our experimental results show that VDDRL is a generic, lightweight, and extensible vulnerability detection method. Compared with other methods, it has better performance and robustness.

Z. Wang and S. Meng—These authors contributed to the work equally and should be regarded as co-first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Liu, Z., Yuan, Y., Wang, S., et al.: SoK: demystifying binary lifters through the lens of downstream applications. In: 2022 IEEE Symposium on Security and Privacy (SP), Los Alamitos, CA, USA. IEEE Computer Society, pp. 453–472 (2022)

    Google Scholar 

  2. Hazimeh, A., Herrera, A., Payer, M.: Magma: a ground-truth fuzzing benchmark. Proc. ACM Meas. Anal. Comput. Syst. 4(3), 1–29 (2020)

    Article  Google Scholar 

  3. Ferrara, P., Mandal, A.K., Cortesi, A., et al.: Static analysis for discovering IoT vulnerabilities. Int. J. Softw. Tools Technol. Trans. 23(1), 71–88 (2021)

    Article  Google Scholar 

  4. Pecorelli, F., Lujan, S., Lenarduzzi, V., et al.: On the adequacy of static analysis warnings with respect to code smell prediction. Empir. Softw. Eng. 27(3), 1–44 (2022)

    Article  Google Scholar 

  5. Palit, T., Moon, J.F., Monrose, F., et al.: Dynpta: combining static and dynamic analysis for practical selective data protection. In: 2021 IEEE Symposium on Security and Privacy (SP). IEEE, pp. 1919–1937 (2021)

    Google Scholar 

  6. Wang, C.Y., You, C.Y., Hsu, F.H., et al.: SMS observer: a dynamic mechanism to analyze the behavior of SMS-based malware. J. Parallel Distrib. Comput. 156, 25–37 (2021)

    Article  Google Scholar 

  7. Li, Y., Chen, B., Chandramohan, M., Lin, S.-W., Liu, Y., Tiu, A.: Steelix: program-state based binary fuzzing. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, pp. 627–637 (2017)

    Google Scholar 

  8. Morrison, P., Herzig, K., Murphy, B., Williams, L.: Challenges with applying vulnerability prediction models. In: Proceedings of the 2015 Symposium and Bootcamp on the Science of Security, pp. 4 (2015)

    Google Scholar 

  9. Hovsepyan, A., Scandariato, R., Joosen, W., Walden, J.: Software vulnerability prediction using text analysis techniques. In: Proceedings of 4th International Workshop Security Measurements and Metrics, MetriSec 2012, pp. 7–10 (2012)

    Google Scholar 

  10. Mou, L., Li, G., Jin, Z., Zhang, L., Wang, T.: TBCNN: a tree-based convolutional neural network for programming language processing. CoRR (2014)

    Google Scholar 

  11. Li, Z., et al.: VulDeePecker: a deep learning-based system for vulnerability detection. CoRR abs/1801.01681 (2018)

    Google Scholar 

  12. Wartschinski, L., et al.: VUDENC: vulnerability detection with deep learning on a natural codebase for Python. Inf. Softw. Technol. 106809 (2022)

    Google Scholar 

  13. Zhang, Y., Gao, X., Duck, G.J., et al.: Program vulnerability repair via inductive inference (2022)

    Google Scholar 

  14. Siow, J.K., Liu, S., Xie, X., et al.: Learning program semantics with code representations: an empirical study. arXiv preprint arXiv:2203.11790 (2022)

  15. Schrammel, D., Weiser, S., Sadek, R., et al.: Jenny: securing syscalls for PKU-based memory isolation systems. In: USENIX Security Symposium (2022)

    Google Scholar 

  16. He, L., Hu, H., Su, P., et al.: FreeWill: automatically diagnosing use-after-free bugs via reference miscounting detection on binaries. In: 31st USENIX Security Symposium (USENIX Security 2022), pp. 2497–2512 (2022)

    Google Scholar 

  17. Liu, K., Kim, D., Bissyand'e, T.F., Yoo, S., Le Traon, Y.: Mining fix patterns for findbugs violations. IEEE Trans. Softw. Eng. 47, 165–188 (2018)

    Google Scholar 

  18. J. Harer, O., et al.: Learning to repair software vulnerabilities with generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 7933–7943 (2018)

    Google Scholar 

  19. Gupta, R., Pal, S., Kanade, A., Shevade, S.: Deepfix: fixing common c language errors by deep learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  20. SQL Injection (2020). https://owasp.org/www-community/attacks/SQL_Injection. Accessed 14 May 2022

  21. Cross-site Scripting (2020). https://owasp.org/www-community/attacks/xss/. Accessed 14 May 2022

  22. Command Injection (2020). https://owasp.org/www-community/attacks/Command_Injection. Accessed 14 May 2022

  23. Cross Site Request Forgery (2020). https://owasp.org/www-community/attacks/csrf. Accessed 14 May 2022

  24. Code Injection (2020). https://owasp.org/www-community/attacks/Code_Injection. Accessed 14 May 2022

  25. CWE-601: Open Redirect (2020). https://cwe.mitre.org/data/definitions/601. Accessed 14 May 2022

  26. Shin, Y., Williams, L.: An empirical model to predict security vulnerabilities using code complexity metrics. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 315–317 (2008)

    Google Scholar 

  27. Chowdhury, I., Zulkernine, M.: Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. J. Syst. Arch. 57(3), 294–313 (2011)

    Google Scholar 

  28. Dam, H.K., Tran, T., Pham, T., Ng, S.W., Grundy, J., Ghose, A.: Automatic feature learning for vulnerability prediction. arXiv preprint. arXiv:1708.02368 (2017)

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China under Grant 2018YFB0803401,in part by the China Postdoctoral Science Foundation funded project (2019M650606), and in part by the First-class Discipline Construction Project of Beijing Electronic Science and Technology Institute (3201012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Z., Meng, S., Chen, Y. (2023). Vulnerability Detection with Representation Learning. In: Wang, G., Choo, KK.R., Wu, J., Damiani, E. (eds) Ubiquitous Security. UbiSec 2022. Communications in Computer and Information Science, vol 1768. Springer, Singapore. https://doi.org/10.1007/978-981-99-0272-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-0272-9_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-0271-2

  • Online ISBN: 978-981-99-0272-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics