skip to main content
10.1145/3529836.3529926acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

Learning-based Vulnerability Detection in Binary Code

Published: 21 June 2022 Publication History

Abstract

Cyberattacks typically exploit software vulnerabilities to compromise computers and smart devices. To address vulnerabilities, many approaches have been developed to detect vulnerabilities using deep learning. However, most learning-based approaches detect vulnerabilities in source code instead of binary code. In this paper, we present our approach on detecting vulnerabilities in binary code. Our approach uses binary code compiled from the SARD dataset to build deep learning models to detect vulnerabilities. It extracts features on the syntax information of the assembly instructions in binary code, and trains two deep learning models on the features for vulnerability detection. From our evaluation, we find that the BLSTM model has the best performance, which achieves an accuracy rate of 81% in detecting vulnerabilities. Particularly the F1-score, recall, and specificity of the BLSTM model are 75%, 95% and 75% respectively. This indicates that the model is balanced in detecting both vulnerable code and non-vulnerable code.

References

[1]
[1] 533 million Facebook users’ phone numbers and personal data have been leaked online 2021. https://www.cshub.com/attacks/articles/iotw-facebook-data-leak-impacts-533-million-users.
[2]
Amy Aumpansub and Zhen Huang. 2021. Detecting Software Vulnerabilities Using Neural Networks. In Proceedings of ICMLC 2021: 13th International Conference on Machine Learning and Computing, Shenzhen China, 26 February, 2021- 1 March, 2021. ACM, 166–171. https://doi.org/10.1145/3457682.3457707
[3]
[3] CISA orders agencies to quickly patch critical Netlogon bug 2020. https://www.cyberscoop.com/cisa-netlogon-microsoft-vulnerability-emergency/.
[4]
Seyed Mohammad Ghaffarian and Hamid Reza Shahriari. 2017. Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM Computing Surveys (CSUR) 50, 4 (2017), 1–36.
[5]
Gustavo Grieco, Guillermo Luis Grinblat, Lucas Uzal, Sanjay Rawat, Josselin Feist, and Laurent Mounier. 2016. Toward Large-Scale Vulnerability Discovery Using Machine Learning. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy(CODASPY ’16). Association for Computing Machinery, New York, NY, USA, 85–96. https://doi.org/10.1145/2857705.2857720
[6]
Zhen Huang, Mariana DAngelo, Dhaval Miyani, and David Lie. 2016. Talos: Neutralizing Vulnerabilities with Security Workarounds for Rapid Response. In Proceedings of 2016 IEEE Symposium on Security and Privacy(S&P 2016). 618–635. https://doi.org/10.1109/SP.2016.43
[7]
Zhen Huang and David Lie. 2014. Ocasta: Clustering Configuration Settings for Error Recovery. In Proceedings of 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN ’14). 479–490. https://doi.org/10.1109/DSN.2014.51
[8]
Zhen Huang and Gang Tan. 2019. Rapid Vulnerability Mitigation with Security Workarounds. In Proceedings of the 2nd NDSS Workshop on Binary Analysis Research(BAR ’19).
[9]
Zhen Huang and Xiaowei Yu. 2021. Integer Overflow Detection with Delayed Runtime Test. In Proceedings of ARES 2021: The 16th International Conference on Availability, Reliability and Security, Vienna, Austria, August 17-20, 2021, Delphine Reinhardt and Tilo Müller (Eds.). ACM, 28:1–28:6. https://doi.org/10.1145/3465481.3465771
[10]
Bo Jiang, Ye Liu, and WK Chan. 2018. Contractfuzzer: Fuzzing smart contracts for vulnerability detection. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 259–269.
[11]
Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel, and Lizhen Qu. 2018. Maximal divergence sequential autoencoder for binary software vulnerability detection. In International Conference on Learning Representations.
[12]
Zhen Li, Deqing Zou, Jing Tang, Zhihao Zhang, Mingqian Sun, and Hai Jin. 2019. A Comparative Study of Deep Learning-Based Vulnerability Detection System. IEEE Access 7(2019), 103184–103197. https://doi.org/10.1109/ACCESS.2019.2930578
[13]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Hanchao Qi, and Jie Hu. 2016. VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis. In Proceedings of the 32nd Annual Conference on Computer Security Applications(ACSAC ’16). Association for Computing Machinery, New York, NY, USA, 201–213. https://doi.org/10.1145/2991079.2991102
[14]
Z. Li, D. Zou, Shouhuai Xu, Xinyu Ou, H. Jin, S. Wang, Zhijun Deng, and Y. Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In Proceedings of the 25th Annual Network and Distributed System Security Symposium, Vol. abs/1801.01681.
[15]
Guanjun Lin, Sheng Wen, Qing-Long Han, Jun Zhang, and Yang Xiang. 2020. Software vulnerability detection using deep neural networks: A survey. Proc. IEEE 108, 10 (2020), 1825–1848.
[16]
Dhaval Miyani, Zhen Huang, and David Lie. 2017. BinPro: A Tool for Binary Source Code Provenance. arXiv:1711.00830.
[17]
[17] Russian government hackers are behind a broad espionage campaign that has compromised U.S. agencies, including Treasury and Commerce 2020. https://www.washingtonpost.com/national-security/russian-government-spies-are-behind-a-broad-hacking-campaign-that-has-breached-us-agencies-and-a-top-cyber-firm/2020/12/13/d5a53b88-3d7d-11eb-9453-fc36ba051781_story.html.
[18]
Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45, 11 (1997), 2673–2681.
[19]
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In IEEE Symposium on Security and Privacy.
[20]
[20] Software Assurance Reference Dataset (SARD) 2021. https://samate.nist.gov/SARD/.
[21]
[21] VMware Flaw a Vector in SolarWinds Breach? 2020. https://krebsonsecurity.com/2020/12/vmware-flaw-a-vector-in-solarwinds-breach/.
[22]
Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically Learning Semantic Features for Defect Prediction. In Proceedings of the 38th International Conference on Software Engineering(ICSE ’16). Association for Computing Machinery, New York, NY, USA, 297–308. https://doi.org/10.1145/2884781.2884804
[23]
Tielei Wang, Tao Wei, Guofei Gu, and Wei Zou. 2010. TaintScope: A checksum-aware directed fuzzing tool for automatic software vulnerability detection. In 2010 IEEE Symposium on Security and Privacy. IEEE, 497–512.
[24]
F. Wu, J. Wang, J. Liu, and W. Wang. 2017. Vulnerability detection with deep learning. In 2017 3rd IEEE International Conference on Computer and Communications (ICCC). 1298–1302. https://doi.org/10.1109/CompComm.2017.8322752
[25]
Fabian Yamaguchi, Felix Lindner, and Konrad Rieck. 2011. Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning. In Proceedings of the 5th USENIX Conference on Offensive Technologies(WOOT’11). USENIX Association, USA, 13.

Cited By

View all
  • (2024)Exploring Semantic Redundancy using Backdoor Triggers: A Complementary Insight into the Challenges Facing DNN-based Software Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/364033333:4(1-28)Online publication date: 24-Jan-2024
  • (2024)Bi-Directional Transformers vs. word2vec: Discovering Vulnerabilities in Lifted Compiled Code2024 Cyber Awareness and Research Symposium (CARS)10.1109/CARS61786.2024.10778724(1-8)Online publication date: 28-Oct-2024
  • (2023)Targeted Symbolic Execution for UAF Vulnerabilities2023 7th International Conference on System Reliability and Safety (ICSRS)10.1109/ICSRS59833.2023.10381130(282-289)Online publication date: 22-Nov-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and Computing
February 2022
570 pages
ISBN:9781450395700
DOI:10.1145/3529836
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. machine learning
  3. neural network
  4. software vulnerability
  5. vulnerability detection

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMLC 2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)9
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring Semantic Redundancy using Backdoor Triggers: A Complementary Insight into the Challenges Facing DNN-based Software Vulnerability DetectionACM Transactions on Software Engineering and Methodology10.1145/364033333:4(1-28)Online publication date: 24-Jan-2024
  • (2024)Bi-Directional Transformers vs. word2vec: Discovering Vulnerabilities in Lifted Compiled Code2024 Cyber Awareness and Research Symposium (CARS)10.1109/CARS61786.2024.10778724(1-8)Online publication date: 28-Oct-2024
  • (2023)Targeted Symbolic Execution for UAF Vulnerabilities2023 7th International Conference on System Reliability and Safety (ICSRS)10.1109/ICSRS59833.2023.10381130(282-289)Online publication date: 22-Nov-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media