ABSTRACT
We present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with different optimization levels, or targeting different architectures. Overcoming this challenge, while avoiding false positives, is invaluable to the process of reverse engineering and the process of locating vulnerable code.
We present a technique that is scalable and precise, as it alleviates the need for heavyweight semantic comparison by performing out-of-context re-optimization of procedure fragments. It works by decomposing binary procedures to comparable fragments and transforming them to a canonical, normalized form using the compiler optimizer, which enables finding equivalent fragments through simple syntactic comparison. We use a statistical framework built by analyzing samples collected "in the wild" to generate a global context that quantifies the significance of each pair of fragments, and uses it to lift pairwise fragment equivalence to whole procedure similarity.
We have implemented our technique in a tool called GitZ and performed an extensive evaluation. We show that GitZ is able to perform millions of comparisons efficiently, and find similarity with high accuracy.
- Esh - statistical similarity of binaries. http://binsim.com.Google Scholar
- gcc optimizations options. https://gcc.gnu.org/onlinedocs/ gcc/Optimize-Options.html.Google Scholar
- Llvm’s analysis and transform passes. http://llvm.org/ docs/Passes.html.Google Scholar
- Mcsema. https://github.com/trailofbits/mcsema.Google Scholar
- Shellshock vulnerability cve information. https://cve. mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-6271.Google Scholar
- Yard - yet another roc drawer. http://github.com/ ntamas/yard.Google Scholar
- zynamics bindi ff. http://www.zynamics.com/bindiff. html.Google Scholar
- zynamics bindi ff manual - understanding bindiff. http: //www.zynamics.com/bindiff/manual/index.html# chapUnderstanding.Google Scholar
- D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz. Bap: A binary analysis platform. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 463–469, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
- Y. David, N. Partush, and E. Yahav. Statistical similarity of binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’16, 2016. Google ScholarDigital Library
- Y. David and E. Yahav. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 349–360, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- J. Duan and J. Regehr. Correctness proofs for device drivers in embedded systems. In 5th International Workshop on Systems Software Verification, SSV’10, Vancouver, BC, Canada, October 6-7, 2010, 2010. Google ScholarDigital Library
- M. Egele, M. Woo, P. Chapman, and D. Brumley. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014., pages 303–317, 2014. Google ScholarDigital Library
- S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla. discovre: E fficient cross-architecture identification of bugs in binary code. In 23nd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21-24, 2016, 2016.Google ScholarCross Ref
- Q. Feng, R. Zhou, C. Xu, Y. Cheng, B. Testa, and H. Yin. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24- 28, 2016, pages 480–491, 2016. Google ScholarDigital Library
- C. Hawblitzel, S. K. Lahiri, K. Pawar, H. Hashmi, S. Gokbulut, L. Fernando, D. Detlefs, and S. Wadsworth. Will you still compile me tomorrow? static cross-version compiler validation. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC /FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, pages 191–201, 2013. Google ScholarDigital Library
- E. R. Jacobson, N. Rosenblum, and B. P. Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools, PASTE ’11, pages 1–8, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- J. Jang, D. Brumley, and S. Venkataraman. BitShred : Feature Hashing Malware for Scalable Triage and Semantic Analysis. Proceedings of the 18th ACM Conference on Computer and Communications Security, pages 309–320, 2011. Google ScholarDigital Library
- W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 329–338, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarDigital Library
- S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo. Symdi ff: A language-agnostic semantic diff tool for imperative programs. In CAV, pages 712–717, 2012. Google ScholarDigital Library
- K. R. M. Leino. This is boogie 2. http://microsoft.com/ en-us/research/publication/this-is-boogie-2-2/.Google Scholar
- N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI, pages 89–100, 2007. Google ScholarDigital Library
- B. H. Ng and A. Prakash. Expose: Discovering potential binary code re-use. In Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual, pages 492–501, July 2013. Google ScholarDigital Library
- N. Partush and E. Yahav. Abstract semantic di fferencing for numerical programs. In Static Analysis: 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings, pages 238–258. Springer, 2013.Google Scholar
- N. Partush and E. Yahav. Abstract semantic di fferencing via speculative correlation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, pages 811–828, 2014. Google ScholarDigital Library
- S. Person, M. B. Dwyer, S. G. Elbaum, and C. S. Pasareanu. Di fferential symbolic execution. In SIGSOFT FSE, pages 226–237, 2008. Google ScholarDigital Library
- J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz. Cross-architecture bug search in binary executables. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pages 709–724, 2015. Google ScholarDigital Library
- J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow. Leveraging semantic signatures for bug search in binary programs. In Proceedings of the 30th Annual Computer Security Applications Conference, ACSAC ’14, pages 406–415, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- D. A. Ramos and D. R. Engler. Practical, low-e ffort equivalence verification of real code. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 669–685, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
- N. Rosenblum, B. P. Miller, and X. Zhu. Recovering the Toolchain Provenance of Binary Code Categories and Subject Descriptors. 20th International Symposium on Software Testing and Analysis (ISSTA), page 11, 2011. Google ScholarDigital Library
- R. Sharma, E. Schkufza, B. Churchill, and A. Aiken. Datadriven equivalence checking. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA ’13, pages 391–406, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna. Sok: (state of) the art of war: O ffensive techniques in binary analysis. 2016.Google Scholar
- R. Smith and S. Horwitz. Detecting and measuring similarity in code clones. In Proceedings of the International Workshop on Software Clones (IWSC), 2009.Google Scholar
- S. J. Swamidass, C. Azencott, K. Daily, and P. Baldi. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics, 26(10):1348–1356, 2010. Google ScholarDigital Library
- M. Weiser. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, San Diego, California, USA, March 9-12, 1981., pages 439–449, 1981. Google ScholarDigital Library
Index Terms
- Similarity of binaries through re-optimization
Recommendations
Similarity of binaries through re-optimization
PLDI '17We present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with ...
Statistical similarity of binaries
PLDI '16We address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled ...
Statistical similarity of binaries
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and ImplementationWe address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled ...
Comments