skip to main content
10.1145/3579856.3582818acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article

Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures

Published:10 July 2023Publication History

ABSTRACT

Binary function clone search is an essential capability that enables multiple applications and use cases, including reverse engineering, patch security inspection, threat analysis, vulnerable function detection, etc. As such, a surge of interest has been expressed in designing and implementing techniques to address function similarity on binary executables and firmware images. Although existing approaches have merit in fingerprinting function clones, they present limitations when the target binary code has been subjected to significant code transformation resulting from obfuscation, compiler optimization, and/or cross-compilation to multiple-CPU architectures. In this regard, we design and implement a system named BinFinder, which employs a neural network to learn binary function embeddings based on a set of extracted features that are resilient to both code obfuscation and compiler optimization techniques. Our experimental evaluation indicates that BinFinder outperforms state-of-the-art approaches for multi-CPU architectures by a large margin, with 46% higher Recall against Gemini, 55% higher Recall against SAFE, and 28% higher Recall against GMN. With respect to obfuscation and compiler optimization clone search approaches, BinFinder outperforms the asm2vec (single CPU architecture approach) with higher Recall and BinMatch (multi-CPU architecture approach) with higher Recall. Finally, our work is the first to provide noteworthy results with respect to binary clone search over the tigress obfuscator, which is a well-established open-source obfuscator.

References

  1. Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2022. A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features. ACM Computing Surveys (CSUR) 55, 1 (2022), 1–41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Christopher M Bishop 1995. Neural Networks for Pattern Recognition. Oxford University Press.Google ScholarGoogle Scholar
  3. Steven H. H. Ding, Benjamin C. M. Fung, and Philippe Charland. 2019. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019. IEEE, 472–489. https://doi.org/10.1109/SP.2019.00003Google ScholarGoogle ScholarCross RefCross Ref
  4. Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code.. In NDSS.Google ScholarGoogle Scholar
  5. Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 480–491.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. FLIRT. 2020. FLIRT @ONLINE. https://hex-rays.com/products/ida/tech/flirt/.Google ScholarGoogle Scholar
  7. Yikun Hu, Hui Wang, Yuanyuan Zhang, Bodong Li, and Dawu Gu. 2019. A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison. IEEE Transactions on Software Engineering 47 (2019), 1241–1258.Google ScholarGoogle ScholarCross RefCross Ref
  8. idapro. 2020. idapro @ONLINE. https://www.hex-rays.com/products/ida/index.shtml.Google ScholarGoogle Scholar
  9. Jianguo Jiang, Gengwang Li, Min Yu, Gang Li, Chao Liu, Zhiqiang Lv, Bin Lv, and Weiqing Huang. 2020. Similarity of binaries across optimization levels and obfuscation. In European Symposium on Research in Computer Security. Springer, 295–315.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2015. Obfuscator-LLVM–software protection for the masses. In 2015 IEEE/ACM 1st International Workshop on Software Protection. IEEE, 3–9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. 2022. Revisiting binary code similarity analysis using interpretable feature engineering and lessons learned. IEEE Transactions on Software Engineering (2022).Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980Google ScholarGoogle Scholar
  13. Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. 2019. Graph matching networks for learning the similarity of graph structured objects. In International conference on machine learning. PMLR, 3835–3845.Google ScholarGoogle Scholar
  14. Bingchang Liu, Wei Huo, Chao Zhang, Wenchao Li, Feng Li, Aihua Piao, and Wei Zou. 2018. α diff: cross-version binary code similarity detection with dnn. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 667–678.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Evaluation in Information Retrieval. Cambridge University Press, 139–161. https://doi.org/10.1017/CBO9780511809071.009Google ScholarGoogle Scholar
  16. Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. [n. d.]. How Machine Learning Is Solving the Binary Function Similarity Problem. ([n. d.]).Google ScholarGoogle Scholar
  17. Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. 2022. How Machine Learning Is Solving the Binary Function Similarity Problem. In 31st USENIX Security Symposium (USENIX Security 22). 2099–2116.Google ScholarGoogle Scholar
  18. Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. SAFE: Self-Attentive Function Embeddings for Binary Similarity. In Detection of Intrusions and Malware, and Vulnerability Assessment - 16th International Conference, DIMVA 2019, Gothenburg, Sweden, June 19-20, 2019, Proceedings(Lecture Notes in Computer Science, Vol. 11543), Roberto Perdisci, Clémentine Maurice, Giorgio Giacinto, and Magnus Almgren (Eds.). Springer, 309–329. https://doi.org/10.1007/978-3-030-22038-9_15Google ScholarGoogle Scholar
  19. Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In ACM Sigplan notices, Vol. 42. ACM, 89–100.Google ScholarGoogle Scholar
  20. Andrew Y Ng, Michael I Jordan, and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems. 849–856.Google ScholarGoogle Scholar
  21. Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. Trex: Learning execution semantics from micro-traces for binary similarity. arXiv preprint arXiv:2012.08680 (2020).Google ScholarGoogle Scholar
  22. Federico Scrinzi. 2015. Behavioral Analysis of Obfuscated Code. http://essay.utwente.nl/67522/Google ScholarGoogle Scholar
  23. Noam Shalev and Nimrod Partush. 2018. Binary similarity detection using machine learning. In Proceedings of the 13th Workshop on Programming Languages and Analysis for Security. 42–47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Krügel, and Giovanni Vigna. 2016. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26, 2016. IEEE Computer Society, 138–157. https://doi.org/10.1109/SP.2016.17Google ScholarGoogle ScholarCross RefCross Ref
  25. tigress. 2020. tigress @ONLINE. https://tigress.wtf/.Google ScholarGoogle Scholar
  26. Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: jump-aware transformer for binary code similarity detection. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 363–376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. 2020. Order matters: semantic-aware neural networks for binary code similarity detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1145–1152.Google ScholarGoogle ScholarCross RefCross Ref
  29. Zeping Yu, Wenxin Zheng, Jiaqi Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2020. Codecmr: Cross-modal retrieval for function-level binary source code matching. Advances in Neural Information Processing Systems 33 (2020), 3872–3883.Google ScholarGoogle Scholar
  30. Fei Zuo, Xiaopeng Li, Patrick Young, Lannan Luo, Qiang Zeng, and Zhexin Zhang. 2019. Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society. https://www.ndss-symposium.org/ndss-paper/neural-machine-translation-inspired-binary-code-similarity-comparison-beyond-function-pairs/Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ASIA CCS '23: Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security
      July 2023
      1066 pages
      ISBN:9798400700989
      DOI:10.1145/3579856

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 July 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate418of2,322submissions,18%
    • Article Metrics

      • Downloads (Last 12 months)336
      • Downloads (Last 6 weeks)26

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format