Towards Attention Based Vulnerability Discovery Using Source Code Representation

Kim, Junae; Hubczenko, David; Montague, Paul

doi:10.1007/978-3-030-30490-4_58

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11730))

Included in the following conference series:

International Conference on Artificial Neural Networks

5175 Accesses

Abstract

Vulnerability discovery in software is an important task in the field of computer security. As vulnerabilities can be abused to enable cyber criminals and other malicious actors to exploit systems, it is crucial to keep software as free from vulnerabilities as is possible. Traditional approaches often comprise code scanning tasks to find specific and already-known classes of cyber vulnerabilities. However these approaches do not in general discover new classes of vulnerabilities. In this paper, we leverage a machine learning approach to model source code representation using syntax, semantics and control flow of source code and to infer vulnerable code patterns to tackle large code bases and identify potential vulnerabilities that missed by any existing static software analysis tools. In addition, our attention-based bidirectional long short-term memory framework adaptively localise regions of code illustrating where the possible vulnerable code fragment exists. The highlighted region may provide informative guidance to human developers or security experts. The experimental results demonstrate the feasibility of the proposed approach in the problem of software vulnerability discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention

Source Code Vulnerability Detection Based on Graph Structure Representation and Attention Mechanisms

Program Source Code Vulnerability Mining Scheme Based on Abstract Syntax Tree

Notes

1.
We do not distinguish a node in the AST and an edge in the CFG in the notation because we process them in the same way.
2.
C was chosen because of its ubiquity and the abundance of datasets. We believe that our technique would be applicable to other programming languages.
3.
The datasets are publicly available on Github, https://github.com/DanielLin1986 /TransferRepresentationLearning.
4.
The description of the vulnerability can be found at https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-5907.

References

Allamanis, M., Brockschmidt, M., Khademi, M.: Learning to represent programs with graphs. In: International Conference on Learning Representations (2018)
Google Scholar
Avancini, A., Ceccato, M.: Comparison and integration of genetic algorithms and dynamic symbolic execution for security testing of cross-site scripting vulnerabilities. Inf. Softw. Technol. 55(12), 2209–2222 (2013)
Article Google Scholar
Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014)
Brownlee, J.: How to handle very long sequences with long short-term memory recurrent neural networks. Machine Learning Mastery, June 2017, https://machinelearningmastery.com/handle-long-sequences-long-short-term-memory-recurrent-neural-networks/
Cadar, C., Dunbar, D., Engler, D.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pp. 209–224. USENIX Association, San Diego (2008)
Google Scholar
Caliskan-Islam, A., et al.: De-anonymizing programmers via code stylometry. In: Proceedings of the 24th USENIX Conference on Security Symposium. pp. 255–270. USENIX Association, Berkeley (2015)
Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D.J., Wierstra, D.: Draw: a recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015)
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
Google Scholar
Hindle, A., Barr, E.T., Su, Z., Gabel, M., Devanbu, P.: On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering, pp. 837–847. IEEE, Zurich, June 2012
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Höschele, M., Zeller, A.: Mining input grammars from dynamic taints. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 720–725. ACM, Singapore (2016)
Google Scholar
Hu, X., Wei, Y., Li, G., Jin, Z.: CodeSum: translate program language to natural language. arXiv preprint arXiv:1708.01837 (2017)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
Li, Y., Su, Z., Wang, L., Li, X.: Steering symbolic execution to less traveled paths. In: Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, pp. 19–32. ACM, New York (2013)
Google Scholar
Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. In: International Conference on Learning Representations (2015)
Google Scholar
Lin, G., Zhang, J., Luo, W., Pan, L., Xiang, Y.: Poster: vulnerability discovery with function representation learning from unlabeled projects. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2539–2541. ACM (2017)
Google Scholar
Lin, G., et al.: Cross-project transfer representation learning for vulnerable function discovery. IEEE Trans. Industr. Inf. 14(7), 3289–3297 (2018)
Article Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Meng, Q., Wen, S., Zhang, B., Tang, C.: Automatically discover vulnerability through similar functions. In: Proceedings of the 2016 Progress in Electromagnetic Research Symposium, pp. 3657–3661. IEEE, Shanghai, August 2016
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mnih, V., Heess, N., Graves, A.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)
Google Scholar
Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: AAAI Conference on Artificial Intelligence, pp. 1287–1293 (2016)
Google Scholar
Ozkan, S.: CVEdetails.com - Security vulnerability database. Security Vulnerabilities, exploits, references and more (2018). https://www.cvedetails.com/
Pang, Y., Xue, X., Namin, A.S.: Predicting vulnerable software components through N-Gram analysis and statistical feature selection. In: Proceedings of the 14th IEEE International Conference on Machine Learning and Applications, pp. 543–548. IEEE, Miami, December 2015
Google Scholar
Peng, H., Mou, L., Li, G., Liu, Y., Zhang, L., Jin, Z.: Building program vector representations for deep learning. In: Zhang, S., Wirsing, M., Zhang, Z. (eds.) KSEM 2015. LNCS (LNAI), vol. 9403, pp. 547–553. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25159-2_49
Chapter Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
Google Scholar
Raychev, V., Vechev, M., Krause, A.: Predicting program properties from “Big Code”. In: Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 111–124. ACM, New York (2015)
Google Scholar
Russell, R., et al.: Automated vulnerability detection in source code using deep representation learning. In: 17th IEEE International Conference on Machine Learning and Application, pp. 757–762 (2018)
Google Scholar
Scandariato, R., Walden, J., Hovsepyan, A., Joosen, W.: Predicting vulnerable software components via text mining. IEEE Trans. Software Eng. 40(10), 993–1006 (2014)
Article Google Scholar
Shu, L., Xu, H., Liu, B.: Doc: Deep open classification of text documents. In: EMNLP, pp. 2911–2916 (2017)
Google Scholar
Sutton, M., Greene, A., Amini, P.: Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley Professional, Reading (2007)
Google Scholar
Wang, S., Chollak, D., Movshovitz-Attias, D., Tan, L.: Bugram: bug detection with n-gram language models. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 708–719. ACM (2016)
Google Scholar
Wilshusen, G.C.: Cybersecurity: recent data breaches illustrate need for strong controls across federal agencies. In: Technical Report, GAO-15-725T. U.S. Government Accountability Office (GAO) (2015)
Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
Google Scholar
Yamaguchi, F., Wressnegger, C., Gascon, H., Rieck, K.: Chucky: exposing missing checks in source code for vulnerability discovery. In: Proceedings of the SIGSAC Conference on Computer & Communications Security, pp. 499–510. ACM (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Defence Science and Technology, Edinburgh, Australia
Junae Kim, David Hubczenko & Paul Montague

Authors

Junae Kim
View author publications
You can also search for this author in PubMed Google Scholar
David Hubczenko
View author publications
You can also search for this author in PubMed Google Scholar
Paul Montague
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junae Kim .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, J., Hubczenko, D., Montague, P. (2019). Towards Attention Based Vulnerability Discovery Using Source Code Representation. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series. ICANN 2019. Lecture Notes in Computer Science(), vol 11730. Springer, Cham. https://doi.org/10.1007/978-3-030-30490-4_58

Download citation

DOI: https://doi.org/10.1007/978-3-030-30490-4_58
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30489-8
Online ISBN: 978-3-030-30490-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Attention Based Vulnerability Discovery Using Source Code Representation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention

Source Code Vulnerability Detection Based on Graph Structure Representation and Attention Mechanisms

Program Source Code Vulnerability Mining Scheme Based on Abstract Syntax Tree

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards Attention Based Vulnerability Discovery Using Source Code Representation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention

Source Code Vulnerability Detection Based on Graph Structure Representation and Attention Mechanisms

Program Source Code Vulnerability Mining Scheme Based on Abstract Syntax Tree

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation