research-article

Public Access

Multiclass Classification of Software Vulnerabilities with Deep Learning

Authors:

Crystal Contreras,

Hristina Dokic,

Daniela Stan Raicu,

Roselyne TchouaAuthors Info & Claims

ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing

Pages 134 - 140

https://doi.org/10.1145/3587716.3587738

Published: 07 September 2023 Publication History

All formats PDF

Abstract

Detecting software vulnerabilities has been a challenge for decades. Many techniques have been developed to detect vulnerabilities by reporting whether a vulnerability exists in the code of software. But few of them have the capability to categorize the types of detected vulnerabilities, which is crucial for human developers or other tools to analyze and address vulnerabilities. In this paper, we present our work on identifying the types of vulnerabilities using deep learning. Our data consists of code slices parsed in a manner that captures the syntax and semantics of a vulnerability, sourced from prior work. We train deep neural networks on these features to perform multiclass classification of software vulnerabilities in the dataset. Our experiments show that our models can effectively identify the vulnerability classes of the vulnerable functions in our dataset.

References

[1]

[1] 533 million Facebook users’ phone numbers and personal data have been leaked online 2021. https://www.businessinsider.com/stolen-data-of-533-million-facebook-users-leaked-online-2021-4.

[2]

[2] All You Need to Know About WannaCry Ransomware 2021. https://www.mimecast.com/blog/all-you-need-to-know-about-wannacry-ransomware/.

[3]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2Vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, POPL, Article 40 (Jan. 2019), 29 pages. https://doi.org/10.1145/3290353

Digital Library

[4]

Amy Aumpansub and Zhen Huang. 2021. Detecting Software Vulnerabilities Using Neural Networks. In 2021 13th International Conference on Machine Learning and Computing (Shenzhen, China) (ICMLC 2021). Association for Computing Machinery, New York, NY, USA, 166–171. https://doi.org/10.1145/3457682.3457707

Digital Library

[5]

Amy Aumpansub and Zhen Huang. 2022. Learning-Based Vulnerability Detection in Binary Code. In 2022 14th International Conference on Machine Learning and Computing (ICMLC) (Guangzhou, China) (ICMLC 2022). Association for Computing Machinery, New York, NY, USA, 266â€“271. https://doi.org/10.1145/3529836.3529926

Digital Library

[6]

Zeki Bilgin, Mehmet Akif Ersoy, Elif Ustundag Soykan, Emrah Tomur, Pinar Çomak, and Leyli Karaçay. 2020. Vulnerability Prediction From Source Code Using Machine Learning. IEEE Access 8 (2020), 150672–150684. https://doi.org/10.1109/ACCESS.2020.3016774

[7]

David Brumley, T Chiueh, R Johnson, and H Lin. 2007. RICH: Automatically protecting against integer-based vulnerabilities. In Proceedings of Ndss ’07. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.7344&rep=rep1&type=pdf%5Cnpapers3://publication/uuid/C0320481-2B40-4264-B778-CBB64ECEFAA4

[8]

Crispin Cowan, Steve Beattie, John Johansen, and Perry Wagle. 2003. {PointGuard}: Protecting Pointers from Buffer Overflow Vulnerabilities. In In Proceedings of the 12th Usenix Security Symposium.

[9]

Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. ISSTA 2018 - Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (2018), 95–105. https://doi.org/10.1145/3213846.3213848

Digital Library

[10]

CWE. 2021 [Online]. Common Weakness Enumeration. https://cwe.mitre.org/data/index.html

[11]

Gustavo Grieco, Guillermo Luis Grinblat, Lucas Uzal, Sanjay Rawat, Josselin Feist, and Laurent Mounier. 2016. Toward Large-Scale Vulnerability Discovery Using Machine Learning. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy (New Orleans, Louisiana, USA) (CODASPY ’16). Association for Computing Machinery, New York, NY, USA, 85–96. https://doi.org/10.1145/2857705.2857720

Digital Library

[12]

Gustavo Grieco, Guillermo Luis Grinblat, Lucas Uzal, Sanjay Rawat, Josselin Feist, and Laurent Mounier. 2016. Toward Large-Scale Vulnerability Discovery Using Machine Learning. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy (New Orleans, Louisiana, USA) (CODASPY ’16). Association for Computing Machinery, New York, NY, USA, 85–96. https://doi.org/10.1145/2857705.2857720

Digital Library

[13]

Zhen Huang, Trent Jaeger, and Gang Tan. 2021. Fine-Grained Program Partitioning for Security. In Proceedings of the 14th European Workshop on Systems Security (Online, United Kingdom) (EuroSec ’21). Association for Computing Machinery, New York, NY, USA, 21–26. https://doi.org/10.1145/3447852.3458717

Digital Library

[14]

Zhen Huang and David Lie. 2014. Ocasta: Clustering Configuration Settings for Error Recovery. In Proceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN ’14). IEEE Computer Society, Washington, DC, USA, 479–490. https://doi.org/10.1109/DSN.2014.51

Digital Library

[15]

Zhen Huang and Gang Tan. 2019. Rapid Vulnerability Mitigation with Security Workarounds. In Proceedings of the 2nd NDSS Workshop on Binary Analysis Research(BAR ’19). https://doi.org/10.14722/bar.2019.23052

[16]

Zhen Huang and Marc White. 2022. Semantic-Aware Vulnerability Detection. In 2022 IEEE International Conference on Cyber Security and Resilience (CSR). 68–75. https://doi.org/10.1109/CSR54599.2022.9850330

[17]

Zhen Huang and Xiaowei Yu. 2021. Integer Overflow Detection with Delayed Runtime Test. In Proceedings of the 16th International Conference on Availability, Reliability and Security, Vienna, Austria, August 17-20, 2021(ARES 2021). ACM, 28:1–28:6. https://doi.org/10.1145/3465481.3465771

Digital Library

[18]

Sanghoon Jeon and Huy Kang Kim. 2021. AutoVAS: An automated vulnerability analysis system with a deep learning approach. Computers and Security 106 (2021), 102308. https://doi.org/10.1016/j.cose.2021.102308

Digital Library

[19]

Keras. [n. d.]. About Keras. Accessed Oct. 18, 2021 [Online]. https://keras.io/about/

[20]

S. Kim, S. Woo, H. Lee, and H. Oh. 2017. VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. In 2017 IEEE Symposium on Security and Privacy (SP). 595–614. https://doi.org/10.1109/SP.2017.62

[21]

Vladimir Kovalenko, Egor Bogomolov, Timofey Bryksin, and Alberto Bacchelli. 2019. PathMiner: a library for mining of path-based representations of code. In Proceedings of the 16th International Conference on Mining Software Repositories. IEEE Press, 13–17.

Digital Library

[22]

Zhen Li, Deqing Zou, Jing Tang, Zhihao Zhang, Mingqian Sun, and Hai Jin. 2019. A Comparative Study of Deep Learning-Based Vulnerability Detection System. IEEE Access 7 (2019), 103184–103197. https://doi.org/10.1109/ACCESS.2019.2930578

[23]

Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Hanchao Qi, and Jie Hu. 2016. VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis. In Proceedings of the 32nd Annual Conference on Computer Security Applications (Los Angeles, California, USA) (ACSAC ’16). Association for Computing Machinery, New York, NY, USA, 201–213. https://doi.org/10.1145/2991079.2991102

Digital Library

[24]

Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing (2021), 1–1. https://doi.org/10.1109/TDSC.2021.3051525

[25]

Z. Li, D. Zou, Shouhuai Xu, Xinyu Ou, H. Jin, S. Wang, Zhijun Deng, and Y. Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In Proceedings of the 25th Annual Network and Distributed System Security Symposium, Vol. abs/1801.01681.

[26]

[26] LockFile: Ransomware Uses PetitPotam Exploit to Compromise Windows Domain Controllers 2021. https://symantec-enterprise-blogs.security.com/blogs/threat-intelligence/lockfile-ransomware-new-petitpotam-windows.

[27]

Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. SIGPLAN Not. 51, 1 (jan 2016), 298–312. https://doi.org/10.1145/2914770.2837617

Digital Library

[28]

James Mickens, Martin Szummer, and Dushyanth Narayanan. 2007. Snitch: interactive decision trees for troubleshooting misconfigurations. In SYSML’07: Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques (Berkeley, CA, USA). USENIX Association, 1–6.

[29]

Dhaval Miyani, Zhen Huang, and David Lie. 2017. BinPro: A Tool for Binary Source Code Provenance. arXiv:1711.00830. arXiv (2017).

[30]

Samuel Ndichu, Sangwook Kim, Seiichi Ozawa, Takeshi Misu, and Kazuo Makishima. 2019. A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors. Applied Soft Computing 84 (2019), 105721. https://doi.org/10.1016/j.asoc.2019.105721

Digital Library

[31]

Hovafv Shacham, Matthew Page, Ben Pfaff, Eu-Jin Goh, Nagendra Modadugu, and Dan Boneh. 2004. On the Effectiveness of Address-Space Randomization. In Proceedings of the11th {ACM} Conference on Computer and Communications Security ({CCS}). 298–307.

Digital Library

[32]

Gaigai Tang, Lianxiao Meng, Huiqiang Wang, Shuangyin Ren, Qiang Wang, Lin Yang, and Weipeng Cao. 2020. A Comparative Study of Neural Network Techniques for Automatic Software Vulnerability Detection. In 2020 International Symposium on Theoretical Aspects of Software Engineering (TASE). 1–8. https://doi.org/10.1109/TASE49443.2020.00010

[33]

TensorFlow. 2021 [Online]. KerasClassifier. https://www.tensorflow.org/api_docs/python/tf/keras/wrappers/scikit_learn/KerasClassifier

[34]

Chin-Wei Tien, Shang-Wen Chen, Tao Ban, and Sy-Yen Kuo. 2020. Machine Learning Framework to Analyze IoT Malware Using ELF and Opcode Features. Digital Threats: Research and Practice 1 (2020), 1–19. Issue 1. https://doi.org/10.1145/3378448

Digital Library

[35]

[35] VMware Flaw a Vector in SolarWinds Breach? 2020. https://krebsonsecurity.com/2020/12/vmware-flaw-a-vector-in-solarwinds-breach/.

[36]

Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically Learning Semantic Features for Defect Prediction. In Proceedings of the 38th International Conference on Software Engineering (Austin, Texas) (ICSE ’16). Association for Computing Machinery, New York, NY, USA, 297–308. https://doi.org/10.1145/2884781.2884804

Digital Library

[37]

Tielei Wang, Tao Wei, Guofei Gu, and Wei Zou. 2010. TaintScope: A checksum-aware directed fuzzing tool for automatic software vulnerability detection. In 2010 IEEE Symposium on Security and Privacy. IEEE, 497–512.

Digital Library

[38]

F. Wu, J. Wang, J. Liu, and W. Wang. 2017. Vulnerability detection with deep learning. In 2017 3rd IEEE International Conference on Computer and Communications (ICCC). 1298–1302. https://doi.org/10.1109/CompComm.2017.8322752

[39]

Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. 2011. Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning. In Proceedings of the 5th USENIX Conference on Offensive Technologies (San Francisco, CA) (WOOT’11). USENIX Association, USA, 13.

[40]

Ding Yuan, Yinglian Xie, Rina Panigrahy, Junfeng Yang, Chad Verbowski, and Arunvijay Kumar. 2011. Context-based online configuration-error detection. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference. 28–28.

Digital Library

[41]

Deqing Zou, Sujuan Wang, Shouhuai Xu, Zhen Li, and Hai Jin. 2021. μ VulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection. IEEE Transactions on Dependable and Secure Computing 18, 5 (2021), 2224–2236. https://doi.org/10.1109/TDSC.2019.2942930

Digital Library

[42]

R. Řehůřek. 2019 [Online]. models.word2vec – Word2vec embeddings. Gensim. https://radimrehurek.com/gensim_3.8.3/models/word2vec.html

Index Terms

Multiclass Classification of Software Vulnerabilities with Deep Learning

Index terms have been assigned to the content through auto-classification.

Recommendations

A Deep Learning Approach for Classifying Vulnerability Descriptions Using Self Attention Based Neural Network
Abstract
Cyber threat intelligence (CTI) refers to essential knowledge used by organizations to prevent or mitigate against cyber attacks. Vulnerability databases such as CVE and NVD are crucial to cyber threat intelligence, but also provide information ...
Unveiling vulnerabilities in deep learning-based malware detection: Differential privacy driven adversarial attacks
Abstract
The exponential increase of Android malware creates a severe threat, motivating the development of machine learning and especially deep learning-based classifiers to detect and mitigate malicious applications. However, these classifiers are ...
An automatic vulnerability classification framework based on BiGRU-TextCNN
Abstract
Common Vulnerabilities and Exposures (CVE) records known vulnerabilities and provides standardized descriptions. By utilizing Common Weakness Enumeration (CWE) to classify vulnerabilities, it can provide richer background knowledge and more ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing

February 2023

619 pages

ISBN:9781450398411

DOI:10.1145/3587716

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

ICMLC 2023

ICMLC 2023: 2023 15th International Conference on Machine Learning and Computing

February 17 - 20, 2023

Zhuhai, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
281
Total Downloads

Downloads (Last 12 months)251
Downloads (Last 6 weeks)28

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents