Skip to main content
Log in

Function matching between binary executables: efficient algorithms and features

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

Binary diffing consists in comparing syntactic and semantic differences of two programs in binary form, when source code is unavailable. It can be reduced to a graph isomorphism problem between the Control Flow Graphs, Call Graphs or other forms of graphs of the compared programs. Here we present REveal, a prototype tool which implements a binary diffing algorithm and an associated set of features, extracted from a binary’s CG and CFGs. Additionally, we explore the potential of applying Markov lumping techniques on function CFGs. The proposed algorithm and features are evaluated in a series of experiments on executables compiled for i386, amd64, arm and aarch64. Furthermore, the effectiveness of our prototype tool, code-named REveal, is assessed in a second series of experiments involving clustering of a corpus of 18 malware samples into 5 malware families. REveal’s results are compared against those produced by Diaphora, the most widely used binary diffing software of the public domain. We conclude that REveal improves the state-of-the-art in binary diffing by achieving higher matching scores, obtained at the cost of a slight running time increase, in most of the experiments conducted. Furthermore, REveal successfully partitions the malware corpus into clusters consisting of samples of the same malware family.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. It should also be mentioned that Diaphora does not implement any kind of propagation phase.

  2. Even though the CFG (and the CG mentioned in the sequel) are digraphs we will follow standard usage and call them graphs.

  3. Initial pre-processing steps of de-obfuscating and unpacking the the executables may be necessary. In fact, such preparatory steps have been used in the past by various authors (e.g. [3, 4, 17]).

  4. Where the \({\textit{depth}}()\) function is the one resulting from the aforementioned DFS.

  5. For the exact meaning of the instruction form constants see [19].

References

  1. Aho, A., Lam, M., Sethi, R., Ullmanr, J.: Compilers: Principles, Techniques, and Tools, 2nd edn. Addison-Wesley Longman Publishing Co., Boston (2006)

    Google Scholar 

  2. Bourquin, M., King, A., Robbins, E.: BinSlayer: accurate comparison of binary executables. In: 2nd ACM SIGPLAN Program Protection and Reverse Engineering (2013)

  3. Cesare, S., Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2010) (2010)

  4. Cesare, S., Xiang, Y., Zhou, W.: Control flow-based malware variant detection. IEEE Trans. Dependable Secur Comput 11, 307–317 (2013)

    Article  Google Scholar 

  5. Deo, N.: Graph Theory with Applications to Engineering and Computer Science. Prentice-Hall Inc, Upper Saddle River (1974)

    MATH  Google Scholar 

  6. Derisavi, S., Hermanns, H., Sanders, W.: Optimal state-space lumping in Markov chains. Inf. Process. Lett. 87, 309–315 (2003)

    Article  MathSciNet  Google Scholar 

  7. Koret, J.: Diaphora: A Free and Open Source Program Diffing Tool [Online]. http://diaphora.re/. Accessed 15 Apr 2019

  8. Dullien, T., Rolles, R.: Graph-based comparison of executable objects. In: Proceedings of the Symposium sur la Securite des Technologies de l’Information et des Communications (2005)

  9. Dullien, T., Carrera, E., Eppler, S. M., Porst, S.: Automated attacker correlation for malicious code. In: NATO Information Systems Technology (IST) 091 (2010)

  10. Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: SP ’15 Proceedings of the 2015 IEEE Symposium on Security and Privacy (2016)

  11. Flake, H.: Structural comparison of executable objects. In: Proceedings of the IEEE Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA) (2004)

  12. Gao, D., Reiter, M., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Information and Communications Security, pp. 238–255 (2008)

    Chapter  Google Scholar 

  13. Hex-Rays: IDA Pro [Online]. https://www.hex-rays.com/products/ida/. Accessed 15 Apr 2019

  14. Henderson, T.A.D., Podgurski, A.: Sampling code clones from program dependence graphs with GRAPLE. In: SWAN 2016 Proceedings of the 2nd International Workshop on Software Analytics (2016)

  15. Howard, R.: Dynamic Probabilistic Systems: volume I: Markov Models. Wiley, Hoboken (1971)

  16. Howard, R.: Dynamic Probabilistic Systems. Volume II: Semi-Markov and Decision Processes. Wiley, Hoboken (1971)

  17. Hu, X., Chiueh, T., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Computer and Communications Security, pp. 611–620 (2009)

  18. Intel: Intel X86 Encoder Decoder Software Library [Online]. https://software.intel.com/en-us/articles/xed-x86-encoder-decoder-software-library. Accessed 15 Apr 2019

  19. Intel: Intel X86 Encoder Decoder [Online]. https://intelxed.github.io/ref-manual/xed-iform-enum_8h.html. Accessed 15 Apr 2019

  20. Jurczyk, M.: Using Binary Diffing to Discover Windows Kernel Memory Disclosure Bugs [Online]. https://googleprojectzero.blogspot.gr/2017/10/using-binary-diffing-to-discover.html. Accessed 15 Apr 2019

  21. Karamitas, C.: Python Bindings for Intel’s XED [Online]. https://github.com/huku-/pyxed. Accessed 15 Apr 2019

  22. Karamitas, C., Kehagias, A.: Efficient Features for function matching between binary executables. In: 2018 IEEE 25th Int Conf Softw Anal Evol Reengineering (SANER), vol. 1, pp. 335–345 (2018)

  23. Kostakis, O., Kinable, J., Mahmoudi, H., Mustonen, K.: Improved call graph comparison using simulated annealing. In: Proceedings of the 2011 ACM Symposium on Applied Computing (2011)

  24. Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. In: Soviet Physics Doklady, pp. 707–710 (1966)

  25. Ming, J., Pan, M., Gao, D.: iBinHunt: binary hunting with inter-procedural control flow. In: Lecture Notes in Computer Science, pp. 92–109 (2013)

    Google Scholar 

  26. Ming, J., Xu, D., Jiang, Y., Wu, D.: BinSim: trace-based semantic binary diffing via system call sliced segment equivalence checking. In: 26th USENIX Security Symposium (USENIX Security 17) (2017)

  27. McAfee: McAfee Labs Threats Report April (2017) [Online]. https://www.mcafee.com/us/resources/reports/rp-quarterly-threats-mar-2017.pdf. Accessed 15 Apr 2019

  28. Panda Security: Pandalabs Quarterly Report Q1 (2017) [Online]. http://www.pandasecurity.com/mediacenter/src/uploads/2017/05/Pandalabs-2017-T1-EN.pdf. Accessed 15 Apr 2019

  29. Ramalingam, G.: On loops, dominators, and dominance frontiers. In: PLDI’00 Proceedings of the ACM SIGPLAN 2000 conference on Programming Language Design and Implementation, pp. 233–241 (2000)

  30. SafeCorp: Detecting Software IP Theft Using CodeMatch [Online]. https://www.safe-corp.com/documents/CodeMatch_Whitepaper.pdf. Accessed 15 Apr 2019

  31. Tarjan, R.: Testing flow graph reducibility. In: STOC’73 Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, pp. 96–107 (1973)

  32. Valmari, A., Franceschinis, G.: Simple O(mlogn) time Markov chain lumping. In: TACAS’10 Proceedings of the 16th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 38–52 (2010)

    Chapter  Google Scholar 

  33. Wang, Z., Pierce, K., McFarling, S.: BMAT: a binary matching tool. In: Second ACM Workshop on Feedback-Directed and Dynamic Optimization (1999)

  34. Wang, Z., Pierce, K., McFarling, S.: BMAT: a binary matching tool for stale profile propagation. J Instr Level Parallel 2, 1–20 (2000)

    Google Scholar 

  35. Zynamics: BinDiff [Online]. https://www.zynamics.com/bindiff.html. Accessed 15 Apr 2019

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chariton Karamitas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is a significantly expanded version of [22] which was presented at 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). Since our last publication, we have received useful and constructive feedback from various individuals we would like to thank; Joxean “matalaz” Koret for developing and open-sourcing Diaphora. Balint “buherator” Varga-Perke for the feedback and discussions on binary diffing. J-Michael Roberts for running VirusShare, a priceless resource for malware related research, and for giving us access to his incredible malware database. Last but not least, we would like to thank Shaul Holtzman and Intezer for giving us access to their platform, code-named Analyze, and for providing us with unpacked executables of the malware samples analyzed in Sect. 6.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karamitas, C., Kehagias, A. Function matching between binary executables: efficient algorithms and features. J Comput Virol Hack Tech 15, 307–323 (2019). https://doi.org/10.1007/s11416-019-00339-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-019-00339-6