skip to main content
10.1145/2594291.2594343acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Tracelet-based code search in executables

Published: 09 June 2014 Publication History

Abstract

We address the problem of code search in executables. Given a function in binary form and a large code base, our goal is to statically find similar functions in the code base. Towards this end, we present a novel technique for computing similarity between functions. Our notion of similarity is based on decomposition of functions into tracelets: continuous, short, partial traces of an execution. To establish tracelet similarity in the face of low-level compiler transformations, we employ a simple rewriting engine. This engine uses constraint solving over alignment constraints and data dependencies to match registers and memory addresses between tracelets, bridging the gap between tracelets that are otherwise similar. We have implemented our approach and applied it to find matches in over a million binary functions. We compare tracelet matching to approaches based on n-grams and graphlets and show that tracelet matching obtains dramatically better precision and recall.

References

[1]
A heap based vulnerability in gnu's rtapelib.c. http://www.cvedetails.com/cve/CVE-2010-0624/.
[2]
Hex-rays IDAPRO. http://www.hex-rays.com.
[3]
Yard-plot. http://pypi.python.org/pypi/yard.
[4]
Balakrishnan, G., and Reps, T. Divine: discovering variables in executables. In VMCAI'07 (2007), pp. 1--28.
[5]
Ball, T., and Larus, J. R. Efficient path profiling. In Proceedings of the 29th Int. Symp. on Microarchitecture (1996), MICRO 29.
[6]
Bansal, S., and Aiken, A. Automatic generation of peephole superoptimizers. In ASPLOS XII (2006).
[7]
Bellon, S., Koschke, R., Antoniol, G., Krinke, J., and Merlo, E. Comparison and evaluation of clone detection tools. IEEE TSE 33, 9 (2007), 577--591.
[8]
Bruschi, D., Martignoni, L., and Monga, M. Detecting self-mutating malware using control-flow graph matching. In DIMVA'06.
[9]
Comparetti, P., Salvaneschi, G., Kirda, E., Kolbitsch, C., Kruegel, C., and Zanero, S. Identifying dormant functionality in malware programs. In IEEE Symp. on Security and Privacy (2010).
[10]
Horwitz, S. Identifying the semantic and textual differences between two versions of a program. In PLDI '90.
[11]
Horwitz, S., Reps, T., and Binkley, D. Interprocedural slicing using dependence graphs. In PLDI '88 (1988).
[12]
Jang, J., Woo, M., and Brumley, D. Towards automatic software lineage inference. In USENIX Security (2013).
[13]
Khoo, W. M., Mycroft, A., and Anderson, R. Rendezvous: a search engine for binary code. In MSR '13.
[14]
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., and Vigna, G. Polymorphic worm detection using structural information of executables. In Proc. of int. conf. on Recent Advances in Intrusion Detection, RAID'05.
[15]
Myles, G., and Collberg, C. K-gram based software birthmarks. In Proceedings of the 2005 ACM symposium on Applied computing, SAC '05, pp. 314--318.
[16]
Partush, N., and Yahav, E. Abstract semantic differencing for numerical programs. In SAS (2013).
[17]
Reps, T., Ball, T., Das, M., and Larus, J. The use of program profiling for software maintenance with applications to the year 2000 problem. In ESEC '97/FSE-5.
[18]
Rosenblum, N., Zhu, X., and Miller, B. P. Who wrote this code? identifying the authors of program binaries. In ESORICS'11.
[19]
Rosenblum, N. E., Miller, B. P., and Zhu, X. Extracting compiler provenance from program binaries. In PASTE'10.
[20]
Saebjornsen, A., Willcock, J., Panas, T., Quinlan, D., and Su, Z. Detecting code clones in binary executables. In ISSTA '09.
[21]
Schkufza, E., Sharma, R., and Aiken, A. Stochastic superoptimization. In ASPLOS '13.
[22]
Sharma, R., Schkufza, E., Churchill, B., and Aiken, A. Data-driven equivalence checking. In OOPSLA'13.
[23]
Singh, R., Gulwani, S., and Solar-Lezama, A. Automated feedback generation for introductory programming assignments. In PLDI '13, pp. 15--26.
[24]
Swamidass, S. J., Azencott, C.-A., Daily, K., and Baldi, P. A CROC stronger than ROC. Bioinformatics 26, 10 (May 2010).
[25]
Wagner, R. A., and Fischer, M. J. The string-to-string correction problem. J. ACM 21, 1 (Jan. 1974), 168--173.

Cited By

View all
  • (2024)Assembly Function Recognition in Embedded Systems as an Optimization ProblemMathematics10.3390/math1205065812:5(658)Online publication date: 23-Feb-2024
  • (2024)A Survey of Binary Code Similarity Detection TechniquesElectronics10.3390/electronics1309171513:9(1715)Online publication date: 29-Apr-2024
  • (2024)Shining Light on the Inter-procedural Code Obfuscation: Keep Pace with Progress in Binary DiffingACM Transactions on Architecture and Code Optimization10.1145/3701992Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2014
619 pages
ISBN:9781450327848
DOI:10.1145/2594291
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 49, Issue 6
    PLDI '14
    June 2014
    598 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2666356
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. static binary analysis
  2. x86
  3. x86-64

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '14
Sponsor:

Acceptance Rates

PLDI '14 Paper Acceptance Rate 52 of 287 submissions, 18%;
Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)72
  • Downloads (Last 6 weeks)9
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Assembly Function Recognition in Embedded Systems as an Optimization ProblemMathematics10.3390/math1205065812:5(658)Online publication date: 23-Feb-2024
  • (2024)A Survey of Binary Code Similarity Detection TechniquesElectronics10.3390/electronics1309171513:9(1715)Online publication date: 29-Apr-2024
  • (2024)Shining Light on the Inter-procedural Code Obfuscation: Keep Pace with Progress in Binary DiffingACM Transactions on Architecture and Code Optimization10.1145/3701992Online publication date: 28-Oct-2024
  • (2024) ARCTURUS: Full Coverage Binary Similarity Analysis with Reachability-guided EmulationACM Transactions on Software Engineering and Methodology10.1145/364033733:4(1-31)Online publication date: 11-Jan-2024
  • (2024)PPT4J: Patch Presence Test for Java BinariesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639231(1-12)Online publication date: 20-May-2024
  • (2024)LibvDiff: Library Version Difference Guided OSS Version Identification in BinariesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623336(1-12)Online publication date: 20-May-2024
  • (2024)BinAug: Enhancing Binary Similarity Analysis with Low-Cost Input RepairingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623328(1-13)Online publication date: 20-May-2024
  • (2024)UniBin: Assembly Semantic-enhanced Binary Vulnerability Detection without DisassemblyInformation Sciences10.1016/j.ins.2024.121605(121605)Online publication date: Oct-2024
  • (2023)BinBench: a benchmark for x64 portable operating system interface binary function representationsPeerJ Computer Science10.7717/peerj-cs.12869(e1286)Online publication date: 1-Jun-2023
  • (2023)A POI Recommendation Model for Intelligent Systems Using AT-LSTM in Location-Based Social Network Big DataInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.33024619:1(1-15)Online publication date: 12-Sep-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media