Abstract
Binary diffing consists in comparing syntactic and semantic differences of two programs in binary form, when source code is unavailable. It can be reduced to a graph isomorphism problem between the Control Flow Graphs, Call Graphs or other forms of graphs of the compared programs. Here we present REveal, a prototype tool which implements a binary diffing algorithm and an associated set of features, extracted from a binary’s CG and CFGs. Additionally, we explore the potential of applying Markov lumping techniques on function CFGs. The proposed algorithm and features are evaluated in a series of experiments on executables compiled for i386, amd64, arm and aarch64. Furthermore, the effectiveness of our prototype tool, code-named REveal, is assessed in a second series of experiments involving clustering of a corpus of 18 malware samples into 5 malware families. REveal’s results are compared against those produced by Diaphora, the most widely used binary diffing software of the public domain. We conclude that REveal improves the state-of-the-art in binary diffing by achieving higher matching scores, obtained at the cost of a slight running time increase, in most of the experiments conducted. Furthermore, REveal successfully partitions the malware corpus into clusters consisting of samples of the same malware family.



Similar content being viewed by others
Notes
It should also be mentioned that Diaphora does not implement any kind of propagation phase.
Even though the CFG (and the CG mentioned in the sequel) are digraphs we will follow standard usage and call them graphs.
Where the \({\textit{depth}}()\) function is the one resulting from the aforementioned DFS.
For the exact meaning of the instruction form constants see [19].
References
Aho, A., Lam, M., Sethi, R., Ullmanr, J.: Compilers: Principles, Techniques, and Tools, 2nd edn. Addison-Wesley Longman Publishing Co., Boston (2006)
Bourquin, M., King, A., Robbins, E.: BinSlayer: accurate comparison of binary executables. In: 2nd ACM SIGPLAN Program Protection and Reverse Engineering (2013)
Cesare, S., Xiang, Y.: Classification of malware using structured control flow. In: Proceedings of the 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2010) (2010)
Cesare, S., Xiang, Y., Zhou, W.: Control flow-based malware variant detection. IEEE Trans. Dependable Secur Comput 11, 307–317 (2013)
Deo, N.: Graph Theory with Applications to Engineering and Computer Science. Prentice-Hall Inc, Upper Saddle River (1974)
Derisavi, S., Hermanns, H., Sanders, W.: Optimal state-space lumping in Markov chains. Inf. Process. Lett. 87, 309–315 (2003)
Koret, J.: Diaphora: A Free and Open Source Program Diffing Tool [Online]. http://diaphora.re/. Accessed 15 Apr 2019
Dullien, T., Rolles, R.: Graph-based comparison of executable objects. In: Proceedings of the Symposium sur la Securite des Technologies de l’Information et des Communications (2005)
Dullien, T., Carrera, E., Eppler, S. M., Porst, S.: Automated attacker correlation for malicious code. In: NATO Information Systems Technology (IST) 091 (2010)
Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: SP ’15 Proceedings of the 2015 IEEE Symposium on Security and Privacy (2016)
Flake, H.: Structural comparison of executable objects. In: Proceedings of the IEEE Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA) (2004)
Gao, D., Reiter, M., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Information and Communications Security, pp. 238–255 (2008)
Hex-Rays: IDA Pro [Online]. https://www.hex-rays.com/products/ida/. Accessed 15 Apr 2019
Henderson, T.A.D., Podgurski, A.: Sampling code clones from program dependence graphs with GRAPLE. In: SWAN 2016 Proceedings of the 2nd International Workshop on Software Analytics (2016)
Howard, R.: Dynamic Probabilistic Systems: volume I: Markov Models. Wiley, Hoboken (1971)
Howard, R.: Dynamic Probabilistic Systems. Volume II: Semi-Markov and Decision Processes. Wiley, Hoboken (1971)
Hu, X., Chiueh, T., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Computer and Communications Security, pp. 611–620 (2009)
Intel: Intel X86 Encoder Decoder Software Library [Online]. https://software.intel.com/en-us/articles/xed-x86-encoder-decoder-software-library. Accessed 15 Apr 2019
Intel: Intel X86 Encoder Decoder [Online]. https://intelxed.github.io/ref-manual/xed-iform-enum_8h.html. Accessed 15 Apr 2019
Jurczyk, M.: Using Binary Diffing to Discover Windows Kernel Memory Disclosure Bugs [Online]. https://googleprojectzero.blogspot.gr/2017/10/using-binary-diffing-to-discover.html. Accessed 15 Apr 2019
Karamitas, C.: Python Bindings for Intel’s XED [Online]. https://github.com/huku-/pyxed. Accessed 15 Apr 2019
Karamitas, C., Kehagias, A.: Efficient Features for function matching between binary executables. In: 2018 IEEE 25th Int Conf Softw Anal Evol Reengineering (SANER), vol. 1, pp. 335–345 (2018)
Kostakis, O., Kinable, J., Mahmoudi, H., Mustonen, K.: Improved call graph comparison using simulated annealing. In: Proceedings of the 2011 ACM Symposium on Applied Computing (2011)
Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. In: Soviet Physics Doklady, pp. 707–710 (1966)
Ming, J., Pan, M., Gao, D.: iBinHunt: binary hunting with inter-procedural control flow. In: Lecture Notes in Computer Science, pp. 92–109 (2013)
Ming, J., Xu, D., Jiang, Y., Wu, D.: BinSim: trace-based semantic binary diffing via system call sliced segment equivalence checking. In: 26th USENIX Security Symposium (USENIX Security 17) (2017)
McAfee: McAfee Labs Threats Report April (2017) [Online]. https://www.mcafee.com/us/resources/reports/rp-quarterly-threats-mar-2017.pdf. Accessed 15 Apr 2019
Panda Security: Pandalabs Quarterly Report Q1 (2017) [Online]. http://www.pandasecurity.com/mediacenter/src/uploads/2017/05/Pandalabs-2017-T1-EN.pdf. Accessed 15 Apr 2019
Ramalingam, G.: On loops, dominators, and dominance frontiers. In: PLDI’00 Proceedings of the ACM SIGPLAN 2000 conference on Programming Language Design and Implementation, pp. 233–241 (2000)
SafeCorp: Detecting Software IP Theft Using CodeMatch [Online]. https://www.safe-corp.com/documents/CodeMatch_Whitepaper.pdf. Accessed 15 Apr 2019
Tarjan, R.: Testing flow graph reducibility. In: STOC’73 Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, pp. 96–107 (1973)
Valmari, A., Franceschinis, G.: Simple O(mlogn) time Markov chain lumping. In: TACAS’10 Proceedings of the 16th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 38–52 (2010)
Wang, Z., Pierce, K., McFarling, S.: BMAT: a binary matching tool. In: Second ACM Workshop on Feedback-Directed and Dynamic Optimization (1999)
Wang, Z., Pierce, K., McFarling, S.: BMAT: a binary matching tool for stale profile propagation. J Instr Level Parallel 2, 1–20 (2000)
Zynamics: BinDiff [Online]. https://www.zynamics.com/bindiff.html. Accessed 15 Apr 2019
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This paper is a significantly expanded version of [22] which was presented at 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). Since our last publication, we have received useful and constructive feedback from various individuals we would like to thank; Joxean “matalaz” Koret for developing and open-sourcing Diaphora. Balint “buherator” Varga-Perke for the feedback and discussions on binary diffing. J-Michael Roberts for running VirusShare, a priceless resource for malware related research, and for giving us access to his incredible malware database. Last but not least, we would like to thank Shaul Holtzman and Intezer for giving us access to their platform, code-named Analyze, and for providing us with unpacked executables of the malware samples analyzed in Sect. 6.
Rights and permissions
About this article
Cite this article
Karamitas, C., Kehagias, A. Function matching between binary executables: efficient algorithms and features. J Comput Virol Hack Tech 15, 307–323 (2019). https://doi.org/10.1007/s11416-019-00339-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-019-00339-6