skip to main content
10.1145/3587716.3587738acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article
Public Access

Multiclass Classification of Software Vulnerabilities with Deep Learning

Published: 07 September 2023 Publication History

Abstract

Detecting software vulnerabilities has been a challenge for decades. Many techniques have been developed to detect vulnerabilities by reporting whether a vulnerability exists in the code of software. But few of them have the capability to categorize the types of detected vulnerabilities, which is crucial for human developers or other tools to analyze and address vulnerabilities. In this paper, we present our work on identifying the types of vulnerabilities using deep learning. Our data consists of code slices parsed in a manner that captures the syntax and semantics of a vulnerability, sourced from prior work. We train deep neural networks on these features to perform multiclass classification of software vulnerabilities in the dataset. Our experiments show that our models can effectively identify the vulnerability classes of the vulnerable functions in our dataset.

References

[1]
[1] 533 million Facebook users’ phone numbers and personal data have been leaked online 2021. https://www.businessinsider.com/stolen-data-of-533-million-facebook-users-leaked-online-2021-4.
[2]
[2] All You Need to Know About WannaCry Ransomware 2021. https://www.mimecast.com/blog/all-you-need-to-know-about-wannacry-ransomware/.
[3]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2Vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, POPL, Article 40 (Jan. 2019), 29 pages. https://doi.org/10.1145/3290353
[4]
Amy Aumpansub and Zhen Huang. 2021. Detecting Software Vulnerabilities Using Neural Networks. In 2021 13th International Conference on Machine Learning and Computing (Shenzhen, China) (ICMLC 2021). Association for Computing Machinery, New York, NY, USA, 166–171. https://doi.org/10.1145/3457682.3457707
[5]
Amy Aumpansub and Zhen Huang. 2022. Learning-Based Vulnerability Detection in Binary Code. In 2022 14th International Conference on Machine Learning and Computing (ICMLC) (Guangzhou, China) (ICMLC 2022). Association for Computing Machinery, New York, NY, USA, 266–271. https://doi.org/10.1145/3529836.3529926
[6]
Zeki Bilgin, Mehmet Akif Ersoy, Elif Ustundag Soykan, Emrah Tomur, Pinar Çomak, and Leyli Karaçay. 2020. Vulnerability Prediction From Source Code Using Machine Learning. IEEE Access 8 (2020), 150672–150684. https://doi.org/10.1109/ACCESS.2020.3016774
[7]
David Brumley, T Chiueh, R Johnson, and H Lin. 2007. RICH: Automatically protecting against integer-based vulnerabilities. In Proceedings of Ndss ’07. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.7344&rep=rep1&type=pdf%5Cnpapers3://publication/uuid/C0320481-2B40-4264-B778-CBB64ECEFAA4
[8]
Crispin Cowan, Steve Beattie, John Johansen, and Perry Wagle. 2003. {PointGuard}: Protecting Pointers from Buffer Overflow Vulnerabilities. In In Proceedings of the 12th Usenix Security Symposium.
[9]
Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. ISSTA 2018 - Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (2018), 95–105. https://doi.org/10.1145/3213846.3213848
[10]
CWE. 2021 [Online]. Common Weakness Enumeration. https://cwe.mitre.org/data/index.html
[11]
Gustavo Grieco, Guillermo Luis Grinblat, Lucas Uzal, Sanjay Rawat, Josselin Feist, and Laurent Mounier. 2016. Toward Large-Scale Vulnerability Discovery Using Machine Learning. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy (New Orleans, Louisiana, USA) (CODASPY ’16). Association for Computing Machinery, New York, NY, USA, 85–96. https://doi.org/10.1145/2857705.2857720
[12]
Gustavo Grieco, Guillermo Luis Grinblat, Lucas Uzal, Sanjay Rawat, Josselin Feist, and Laurent Mounier. 2016. Toward Large-Scale Vulnerability Discovery Using Machine Learning. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy (New Orleans, Louisiana, USA) (CODASPY ’16). Association for Computing Machinery, New York, NY, USA, 85–96. https://doi.org/10.1145/2857705.2857720
[13]
Zhen Huang, Trent Jaeger, and Gang Tan. 2021. Fine-Grained Program Partitioning for Security. In Proceedings of the 14th European Workshop on Systems Security (Online, United Kingdom) (EuroSec ’21). Association for Computing Machinery, New York, NY, USA, 21–26. https://doi.org/10.1145/3447852.3458717
[14]
Zhen Huang and David Lie. 2014. Ocasta: Clustering Configuration Settings for Error Recovery. In Proceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN ’14). IEEE Computer Society, Washington, DC, USA, 479–490. https://doi.org/10.1109/DSN.2014.51
[15]
Zhen Huang and Gang Tan. 2019. Rapid Vulnerability Mitigation with Security Workarounds. In Proceedings of the 2nd NDSS Workshop on Binary Analysis Research(BAR ’19). https://doi.org/10.14722/bar.2019.23052
[16]
Zhen Huang and Marc White. 2022. Semantic-Aware Vulnerability Detection. In 2022 IEEE International Conference on Cyber Security and Resilience (CSR). 68–75. https://doi.org/10.1109/CSR54599.2022.9850330
[17]
Zhen Huang and Xiaowei Yu. 2021. Integer Overflow Detection with Delayed Runtime Test. In Proceedings of the 16th International Conference on Availability, Reliability and Security, Vienna, Austria, August 17-20, 2021(ARES 2021). ACM, 28:1–28:6. https://doi.org/10.1145/3465481.3465771
[18]
Sanghoon Jeon and Huy Kang Kim. 2021. AutoVAS: An automated vulnerability analysis system with a deep learning approach. Computers and Security 106 (2021), 102308. https://doi.org/10.1016/j.cose.2021.102308
[19]
Keras. [n. d.]. About Keras. Accessed Oct. 18, 2021 [Online]. https://keras.io/about/
[20]
S. Kim, S. Woo, H. Lee, and H. Oh. 2017. VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. In 2017 IEEE Symposium on Security and Privacy (SP). 595–614. https://doi.org/10.1109/SP.2017.62
[21]
Vladimir Kovalenko, Egor Bogomolov, Timofey Bryksin, and Alberto Bacchelli. 2019. PathMiner: a library for mining of path-based representations of code. In Proceedings of the 16th International Conference on Mining Software Repositories. IEEE Press, 13–17.
[22]
Zhen Li, Deqing Zou, Jing Tang, Zhihao Zhang, Mingqian Sun, and Hai Jin. 2019. A Comparative Study of Deep Learning-Based Vulnerability Detection System. IEEE Access 7 (2019), 103184–103197. https://doi.org/10.1109/ACCESS.2019.2930578
[23]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Hanchao Qi, and Jie Hu. 2016. VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis. In Proceedings of the 32nd Annual Conference on Computer Security Applications (Los Angeles, California, USA) (ACSAC ’16). Association for Computing Machinery, New York, NY, USA, 201–213. https://doi.org/10.1145/2991079.2991102
[24]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing (2021), 1–1. https://doi.org/10.1109/TDSC.2021.3051525
[25]
Z. Li, D. Zou, Shouhuai Xu, Xinyu Ou, H. Jin, S. Wang, Zhijun Deng, and Y. Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In Proceedings of the 25th Annual Network and Distributed System Security Symposium, Vol. abs/1801.01681.
[26]
[26] LockFile: Ransomware Uses PetitPotam Exploit to Compromise Windows Domain Controllers 2021. https://symantec-enterprise-blogs.security.com/blogs/threat-intelligence/lockfile-ransomware-new-petitpotam-windows.
[27]
Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. SIGPLAN Not. 51, 1 (jan 2016), 298–312. https://doi.org/10.1145/2914770.2837617
[28]
James Mickens, Martin Szummer, and Dushyanth Narayanan. 2007. Snitch: interactive decision trees for troubleshooting misconfigurations. In SYSML’07: Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques (Berkeley, CA, USA). USENIX Association, 1–6.
[29]
Dhaval Miyani, Zhen Huang, and David Lie. 2017. BinPro: A Tool for Binary Source Code Provenance. arXiv:1711.00830. arXiv (2017).
[30]
Samuel Ndichu, Sangwook Kim, Seiichi Ozawa, Takeshi Misu, and Kazuo Makishima. 2019. A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors. Applied Soft Computing 84 (2019), 105721. https://doi.org/10.1016/j.asoc.2019.105721
[31]
Hovafv Shacham, Matthew Page, Ben Pfaff, Eu-Jin Goh, Nagendra Modadugu, and Dan Boneh. 2004. On the Effectiveness of Address-Space Randomization. In Proceedings of the11th {ACM} Conference on Computer and Communications Security ({CCS}). 298–307.
[32]
Gaigai Tang, Lianxiao Meng, Huiqiang Wang, Shuangyin Ren, Qiang Wang, Lin Yang, and Weipeng Cao. 2020. A Comparative Study of Neural Network Techniques for Automatic Software Vulnerability Detection. In 2020 International Symposium on Theoretical Aspects of Software Engineering (TASE). 1–8. https://doi.org/10.1109/TASE49443.2020.00010
[33]
TensorFlow. 2021 [Online]. KerasClassifier. https://www.tensorflow.org/api_docs/python/tf/keras/wrappers/scikit_learn/KerasClassifier
[34]
Chin-Wei Tien, Shang-Wen Chen, Tao Ban, and Sy-Yen Kuo. 2020. Machine Learning Framework to Analyze IoT Malware Using ELF and Opcode Features. Digital Threats: Research and Practice 1 (2020), 1–19. Issue 1. https://doi.org/10.1145/3378448
[35]
[35] VMware Flaw a Vector in SolarWinds Breach? 2020. https://krebsonsecurity.com/2020/12/vmware-flaw-a-vector-in-solarwinds-breach/.
[36]
Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically Learning Semantic Features for Defect Prediction. In Proceedings of the 38th International Conference on Software Engineering (Austin, Texas) (ICSE ’16). Association for Computing Machinery, New York, NY, USA, 297–308. https://doi.org/10.1145/2884781.2884804
[37]
Tielei Wang, Tao Wei, Guofei Gu, and Wei Zou. 2010. TaintScope: A checksum-aware directed fuzzing tool for automatic software vulnerability detection. In 2010 IEEE Symposium on Security and Privacy. IEEE, 497–512.
[38]
F. Wu, J. Wang, J. Liu, and W. Wang. 2017. Vulnerability detection with deep learning. In 2017 3rd IEEE International Conference on Computer and Communications (ICCC). 1298–1302. https://doi.org/10.1109/CompComm.2017.8322752
[39]
Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. 2011. Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning. In Proceedings of the 5th USENIX Conference on Offensive Technologies (San Francisco, CA) (WOOT’11). USENIX Association, USA, 13.
[40]
Ding Yuan, Yinglian Xie, Rina Panigrahy, Junfeng Yang, Chad Verbowski, and Arunvijay Kumar. 2011. Context-based online configuration-error detection. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference. 28–28.
[41]
Deqing Zou, Sujuan Wang, Shouhuai Xu, Zhen Li, and Hai Jin. 2021. μ VulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection. IEEE Transactions on Dependable and Secure Computing 18, 5 (2021), 2224–2236. https://doi.org/10.1109/TDSC.2019.2942930
[42]
R. Řehůřek. 2019 [Online]. models.word2vec – Word2vec embeddings. Gensim. https://radimrehurek.com/gensim_3.8.3/models/word2vec.html

Index Terms

  1. Multiclass Classification of Software Vulnerabilities with Deep Learning
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Other conferences
            ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing
            February 2023
            619 pages
            ISBN:9781450398411
            DOI:10.1145/3587716
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 07 September 2023

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. Vulnerability classification
            2. deep learning
            3. machine learning
            4. neural networks
            5. software and application security

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Funding Sources

            Conference

            ICMLC 2023

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 281
              Total Downloads
            • Downloads (Last 12 months)251
            • Downloads (Last 6 weeks)28
            Reflects downloads up to 20 Jan 2025

            Other Metrics

            Citations

            View Options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media