research-article

Similarity of binaries through re-optimization

Authors:
Yaniv David

Technion, Israel

Technion, Israel
View Profile

,
Nimrod Partush

Technion, Israel

Technion, Israel
View Profile

,
Eran Yahav

Technion, Israel

Technion, Israel
View Profile

PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2017Pages 79–94https://doi.org/10.1145/3062341.3062387

Published:14 June 2017Publication History

PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 79–94

ABSTRACT

We present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with different optimization levels, or targeting different architectures. Overcoming this challenge, while avoiding false positives, is invaluable to the process of reverse engineering and the process of locating vulnerable code.

We present a technique that is scalable and precise, as it alleviates the need for heavyweight semantic comparison by performing out-of-context re-optimization of procedure fragments. It works by decomposing binary procedures to comparable fragments and transforming them to a canonical, normalized form using the compiler optimizer, which enables finding equivalent fragments through simple syntactic comparison. We use a statistical framework built by analyzing samples collected "in the wild" to generate a global context that quantifies the significance of each pair of fragments, and uses it to lift pairwise fragment equivalence to whole procedure similarity.

We have implemented our technique in a tool called GitZ and performed an extensive evaluation. We show that GitZ is able to perform millions of comparisons efficiently, and find similarity with high accuracy.

References

Esh - statistical similarity of binaries. http://binsim.com.Google Scholar
gcc optimizations options. https://gcc.gnu.org/onlinedocs/ gcc/Optimize-Options.html.Google Scholar
Llvm’s analysis and transform passes. http://llvm.org/ docs/Passes.html.Google Scholar
Mcsema. https://github.com/trailofbits/mcsema.Google Scholar
Shellshock vulnerability cve information. https://cve. mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-6271.Google Scholar
Yard - yet another roc drawer. http://github.com/ ntamas/yard.Google Scholar
zynamics bindi ff. http://www.zynamics.com/bindiff. html.Google Scholar
zynamics bindi ff manual - understanding bindiff. http: //www.zynamics.com/bindiff/manual/index.html# chapUnderstanding.Google Scholar
D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz. Bap: A binary analysis platform. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 463–469, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
Y. David, N. Partush, and E. Yahav. Statistical similarity of binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’16, 2016. Google ScholarDigital Library
Y. David and E. Yahav. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 349–360, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
J. Duan and J. Regehr. Correctness proofs for device drivers in embedded systems. In 5th International Workshop on Systems Software Verification, SSV’10, Vancouver, BC, Canada, October 6-7, 2010, 2010. Google ScholarDigital Library
M. Egele, M. Woo, P. Chapman, and D. Brumley. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014., pages 303–317, 2014. Google ScholarDigital Library
S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla. discovre: E fficient cross-architecture identification of bugs in binary code. In 23nd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21-24, 2016, 2016.Google ScholarCross Ref
Q. Feng, R. Zhou, C. Xu, Y. Cheng, B. Testa, and H. Yin. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24- 28, 2016, pages 480–491, 2016. Google ScholarDigital Library
C. Hawblitzel, S. K. Lahiri, K. Pawar, H. Hashmi, S. Gokbulut, L. Fernando, D. Detlefs, and S. Wadsworth. Will you still compile me tomorrow? static cross-version compiler validation. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC /FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, pages 191–201, 2013. Google ScholarDigital Library
E. R. Jacobson, N. Rosenblum, and B. P. Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools, PASTE ’11, pages 1–8, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
J. Jang, D. Brumley, and S. Venkataraman. BitShred : Feature Hashing Malware for Scalable Triage and Semantic Analysis. Proceedings of the 18th ACM Conference on Computer and Communications Security, pages 309–320, 2011. Google ScholarDigital Library
W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 329–338, Piscataway, NJ, USA, 2013. IEEE Press. Google ScholarDigital Library
S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo. Symdi ff: A language-agnostic semantic diff tool for imperative programs. In CAV, pages 712–717, 2012. Google ScholarDigital Library
K. R. M. Leino. This is boogie 2. http://microsoft.com/ en-us/research/publication/this-is-boogie-2-2/.Google Scholar
N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI, pages 89–100, 2007. Google ScholarDigital Library
B. H. Ng and A. Prakash. Expose: Discovering potential binary code re-use. In Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual, pages 492–501, July 2013. Google ScholarDigital Library
N. Partush and E. Yahav. Abstract semantic di fferencing for numerical programs. In Static Analysis: 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings, pages 238–258. Springer, 2013.Google Scholar
N. Partush and E. Yahav. Abstract semantic di fferencing via speculative correlation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, pages 811–828, 2014. Google ScholarDigital Library
S. Person, M. B. Dwyer, S. G. Elbaum, and C. S. Pasareanu. Di fferential symbolic execution. In SIGSOFT FSE, pages 226–237, 2008. Google ScholarDigital Library
J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz. Cross-architecture bug search in binary executables. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pages 709–724, 2015. Google ScholarDigital Library
J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow. Leveraging semantic signatures for bug search in binary programs. In Proceedings of the 30th Annual Computer Security Applications Conference, ACSAC ’14, pages 406–415, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
D. A. Ramos and D. R. Engler. Practical, low-e ffort equivalence verification of real code. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 669–685, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
N. Rosenblum, B. P. Miller, and X. Zhu. Recovering the Toolchain Provenance of Binary Code Categories and Subject Descriptors. 20th International Symposium on Software Testing and Analysis (ISSTA), page 11, 2011. Google ScholarDigital Library
R. Sharma, E. Schkufza, B. Churchill, and A. Aiken. Datadriven equivalence checking. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages &#38; Applications, OOPSLA ’13, pages 391–406, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna. Sok: (state of) the art of war: O ffensive techniques in binary analysis. 2016.Google Scholar
R. Smith and S. Horwitz. Detecting and measuring similarity in code clones. In Proceedings of the International Workshop on Software Clones (IWSC), 2009.Google Scholar
S. J. Swamidass, C. Azencott, K. Daily, and P. Baldi. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics, 26(10):1348–1356, 2010. Google ScholarDigital Library
M. Weiser. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, San Diego, California, USA, March 9-12, 1981., pages 439–449, 1981. Google ScholarDigital Library

Index Terms

Similarity of binaries through re-optimization

Recommendations

Similarity of binaries through re-optimization
PLDI '17

We present a scalable approach for establishing similarity between stripped binaries (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with ...
Read More
Statistical similarity of binaries
PLDI '16

We address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled ...
Read More
Statistical similarity of binaries
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

We address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2017
708 pages
ISBN:9781450349888
DOI:10.1145/3062341
General Chair:
Albert Cohen
Inria, France
,
Program Chair:
Martin Vechev
DeepCode, Switzerland / ETH Zurich, Switzerland
ACM SIGPLAN Notices Volume 52, Issue 6
PLDI '17
June 2017
708 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3140587
Editor:
Matthew Fluet
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
binary code search
static binary analysis
statistical similarity
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 67
  Total Citations
  View Citations
- 1,118
  Total Downloads
- Downloads (Last 12 months)101
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Similarity of binaries through re-optimization

PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Similarity of binaries through re-optimization

Statistical similarity of binaries

Statistical similarity of binaries