skip to main content
10.1145/3062341.3062387acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Similarity of binaries through re-optimization

Published:14 June 2017Publication History

ABSTRACT

We present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with different optimization levels, or targeting different architectures. Overcoming this challenge, while avoiding false positives, is invaluable to the process of reverse engineering and the process of locating vulnerable code.

We present a technique that is scalable and precise, as it alleviates the need for heavyweight semantic comparison by performing out-of-context re-optimization of procedure fragments. It works by decomposing binary procedures to comparable fragments and transforming them to a canonical, normalized form using the compiler optimizer, which enables finding equivalent fragments through simple syntactic comparison. We use a statistical framework built by analyzing samples collected "in the wild" to generate a global context that quantifies the significance of each pair of fragments, and uses it to lift pairwise fragment equivalence to whole procedure similarity.

We have implemented our technique in a tool called GitZ and performed an extensive evaluation. We show that GitZ is able to perform millions of comparisons efficiently, and find similarity with high accuracy.

References

  1. Esh - statistical similarity of binaries. http://binsim.com.Google ScholarGoogle Scholar
  2. gcc optimizations options. https://gcc.gnu.org/onlinedocs/ gcc/Optimize-Options.html.Google ScholarGoogle Scholar
  3. Llvm’s analysis and transform passes. http://llvm.org/ docs/Passes.html.Google ScholarGoogle Scholar
  4. Mcsema. https://github.com/trailofbits/mcsema.Google ScholarGoogle Scholar
  5. Shellshock vulnerability cve information. https://cve. mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-6271.Google ScholarGoogle Scholar
  6. Yard - yet another roc drawer. http://github.com/ ntamas/yard.Google ScholarGoogle Scholar
  7. zynamics bindi ff. http://www.zynamics.com/bindiff. html.Google ScholarGoogle Scholar
  8. zynamics bindi ff manual - understanding bindiff. http: //www.zynamics.com/bindiff/manual/index.html# chapUnderstanding.Google ScholarGoogle Scholar
  9. D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz. Bap: A binary analysis platform. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 463–469, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. David, N. Partush, and E. Yahav. Statistical similarity of binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’16, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. David and E. Yahav. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 349–360, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Duan and J. Regehr. Correctness proofs for device drivers in embedded systems. In 5th International Workshop on Systems Software Verification, SSV’10, Vancouver, BC, Canada, October 6-7, 2010, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Egele, M. Woo, P. Chapman, and D. Brumley. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014., pages 303–317, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla. discovre: E fficient cross-architecture identification of bugs in binary code. In 23nd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21-24, 2016, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  15. Q. Feng, R. Zhou, C. Xu, Y. Cheng, B. Testa, and H. Yin. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24- 28, 2016, pages 480–491, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Hawblitzel, S. K. Lahiri, K. Pawar, H. Hashmi, S. Gokbulut, L. Fernando, D. Detlefs, and S. Wadsworth. Will you still compile me tomorrow? static cross-version compiler validation. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC /FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, pages 191–201, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. R. Jacobson, N. Rosenblum, and B. P. Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools, PASTE ’11, pages 1–8, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Jang, D. Brumley, and S. Venkataraman. BitShred : Feature Hashing Malware for Scalable Triage and Semantic Analysis. Proceedings of the 18th ACM Conference on Computer and Communications Security, pages 309–320, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 329–338, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo. Symdi ff: A language-agnostic semantic diff tool for imperative programs. In CAV, pages 712–717, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. R. M. Leino. This is boogie 2. http://microsoft.com/ en-us/research/publication/this-is-boogie-2-2/.Google ScholarGoogle Scholar
  22. N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI, pages 89–100, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. H. Ng and A. Prakash. Expose: Discovering potential binary code re-use. In Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual, pages 492–501, July 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Partush and E. Yahav. Abstract semantic di fferencing for numerical programs. In Static Analysis: 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings, pages 238–258. Springer, 2013.Google ScholarGoogle Scholar
  25. N. Partush and E. Yahav. Abstract semantic di fferencing via speculative correlation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, pages 811–828, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Person, M. B. Dwyer, S. G. Elbaum, and C. S. Pasareanu. Di fferential symbolic execution. In SIGSOFT FSE, pages 226–237, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz. Cross-architecture bug search in binary executables. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pages 709–724, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow. Leveraging semantic signatures for bug search in binary programs. In Proceedings of the 30th Annual Computer Security Applications Conference, ACSAC ’14, pages 406–415, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. A. Ramos and D. R. Engler. Practical, low-e ffort equivalence verification of real code. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 669–685, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Rosenblum, B. P. Miller, and X. Zhu. Recovering the Toolchain Provenance of Binary Code Categories and Subject Descriptors. 20th International Symposium on Software Testing and Analysis (ISSTA), page 11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Sharma, E. Schkufza, B. Churchill, and A. Aiken. Datadriven equivalence checking. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA ’13, pages 391–406, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna. Sok: (state of) the art of war: O ffensive techniques in binary analysis. 2016.Google ScholarGoogle Scholar
  33. R. Smith and S. Horwitz. Detecting and measuring similarity in code clones. In Proceedings of the International Workshop on Software Clones (IWSC), 2009.Google ScholarGoogle Scholar
  34. S. J. Swamidass, C. Azencott, K. Daily, and P. Baldi. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics, 26(10):1348–1356, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Weiser. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, San Diego, California, USA, March 9-12, 1981., pages 439–449, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Similarity of binaries through re-optimization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2017
          708 pages
          ISBN:9781450349888
          DOI:10.1145/3062341

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 June 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate406of2,067submissions,20%

          Upcoming Conference

          PLDI '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader