skip to main content
10.1145/2594291.2594343acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Tracelet-based code search in executables

Published:09 June 2014Publication History

ABSTRACT

We address the problem of code search in executables. Given a function in binary form and a large code base, our goal is to statically find similar functions in the code base. Towards this end, we present a novel technique for computing similarity between functions. Our notion of similarity is based on decomposition of functions into tracelets: continuous, short, partial traces of an execution. To establish tracelet similarity in the face of low-level compiler transformations, we employ a simple rewriting engine. This engine uses constraint solving over alignment constraints and data dependencies to match registers and memory addresses between tracelets, bridging the gap between tracelets that are otherwise similar. We have implemented our approach and applied it to find matches in over a million binary functions. We compare tracelet matching to approaches based on n-grams and graphlets and show that tracelet matching obtains dramatically better precision and recall.

References

  1. A heap based vulnerability in gnu's rtapelib.c. http://www.cvedetails.com/cve/CVE-2010-0624/.Google ScholarGoogle Scholar
  2. Hex-rays IDAPRO. http://www.hex-rays.com.Google ScholarGoogle Scholar
  3. Yard-plot. http://pypi.python.org/pypi/yard.Google ScholarGoogle Scholar
  4. Balakrishnan, G., and Reps, T. Divine: discovering variables in executables. In VMCAI'07 (2007), pp. 1--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ball, T., and Larus, J. R. Efficient path profiling. In Proceedings of the 29th Int. Symp. on Microarchitecture (1996), MICRO 29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bansal, S., and Aiken, A. Automatic generation of peephole superoptimizers. In ASPLOS XII (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bellon, S., Koschke, R., Antoniol, G., Krinke, J., and Merlo, E. Comparison and evaluation of clone detection tools. IEEE TSE 33, 9 (2007), 577--591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bruschi, D., Martignoni, L., and Monga, M. Detecting self-mutating malware using control-flow graph matching. In DIMVA'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Comparetti, P., Salvaneschi, G., Kirda, E., Kolbitsch, C., Kruegel, C., and Zanero, S. Identifying dormant functionality in malware programs. In IEEE Symp. on Security and Privacy (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Horwitz, S. Identifying the semantic and textual differences between two versions of a program. In PLDI '90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Horwitz, S., Reps, T., and Binkley, D. Interprocedural slicing using dependence graphs. In PLDI '88 (1988). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jang, J., Woo, M., and Brumley, D. Towards automatic software lineage inference. In USENIX Security (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Khoo, W. M., Mycroft, A., and Anderson, R. Rendezvous: a search engine for binary code. In MSR '13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kruegel, C., Kirda, E., Mutz, D., Robertson, W., and Vigna, G. Polymorphic worm detection using structural information of executables. In Proc. of int. conf. on Recent Advances in Intrusion Detection, RAID'05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Myles, G., and Collberg, C. K-gram based software birthmarks. In Proceedings of the 2005 ACM symposium on Applied computing, SAC '05, pp. 314--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Partush, N., and Yahav, E. Abstract semantic differencing for numerical programs. In SAS (2013).Google ScholarGoogle Scholar
  17. Reps, T., Ball, T., Das, M., and Larus, J. The use of program profiling for software maintenance with applications to the year 2000 problem. In ESEC '97/FSE-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rosenblum, N., Zhu, X., and Miller, B. P. Who wrote this code? identifying the authors of program binaries. In ESORICS'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Rosenblum, N. E., Miller, B. P., and Zhu, X. Extracting compiler provenance from program binaries. In PASTE'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Saebjornsen, A., Willcock, J., Panas, T., Quinlan, D., and Su, Z. Detecting code clones in binary executables. In ISSTA '09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Schkufza, E., Sharma, R., and Aiken, A. Stochastic superoptimization. In ASPLOS '13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sharma, R., Schkufza, E., Churchill, B., and Aiken, A. Data-driven equivalence checking. In OOPSLA'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Singh, R., Gulwani, S., and Solar-Lezama, A. Automated feedback generation for introductory programming assignments. In PLDI '13, pp. 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Swamidass, S. J., Azencott, C.-A., Daily, K., and Baldi, P. A CROC stronger than ROC. Bioinformatics 26, 10 (May 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wagner, R. A., and Fischer, M. J. The string-to-string correction problem. J. ACM 21, 1 (Jan. 1974), 168--173. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Tracelet-based code search in executables

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2014
          619 pages
          ISBN:9781450327848
          DOI:10.1145/2594291
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 49, Issue 6
            PLDI '14
            June 2014
            598 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2666356
            • Editor:
            • Andy Gill
            Issue’s Table of Contents

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 June 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          PLDI '14 Paper Acceptance Rate52of287submissions,18%Overall Acceptance Rate406of2,067submissions,20%

          Upcoming Conference

          PLDI '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader