Skip to main content
Log in

MalwareHunt: semantics-based malware diffing speedup by normalized basic block memoization

  • Original Paper
  • Published:
Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Abstract

The key challenge of software reverse engineering is that the source code of the program under investigation is typically not available. Identifying differences between two executable binaries (binary diffing) can reveal valuable information in the absence of source code, such as vulnerability patches, software plagiarism evidence, and malware variant relations. Recently, a new binary diffing method based on symbolic execution and constraint solving has been proposed to look for the code pairs with the same semantics, even though they are ostensibly different in syntactics. Such semantics-based method captures intrinsic differences/similarities of binary code, making it a compelling choice to analyze highly-obfuscated malicious programs. However, due to the nature of symbolic execution, semantics-based binary diffing suffers from significant performance slowdown, hindering it from analyzing large numbers of malware samples. In this paper, we attempt to mitigate the high overhead of semantics-based binary diffing with application to malware lineage inference. We first study the key obstacles that contribute to the performance bottleneck. Then we propose normalized basic block memoization to speed up semantics-based binary diffing. We introduce an union-find set structure that records semantically equivalent basic blocks. Managing the union-find structure during successive comparisons allows direct reuse of previously computed results. Moreover, we utilize a set of enhanced optimization methods to further cut down the invocation numbers of constraint solver. We have implemented our technique, called MalwareHunt, on top of a trace-oriented binary diffing tool and evaluated it on 15 polymorphic and metamorphic malware families. We perform intra-family comparisons for the purpose of malware lineage inference. Our experimental results show that MalwareHuntcan accelerate symbolic execution from 2.8X to 5.3X (with an average 4.1X), and reduce constraint solver invocation by a factor of 3.0X to 6.0X (with an average 4.5X).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://vxheaven.org/src.php.

  2. https://www.virustotal.com/.

  3. https://www.cygwin.com.

References

  1. Bourquin, M., King, A., Robbins, E.: Binslayer: Accurate comparison of binary executables. In: Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop (PPREW ’13) (2013)

  2. Brumley, D., Poosankam, P., Song, D., Zheng, J.: Automatic patch-based exploit generation is possible: techniques and implications. In: Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP’08) (2008)

  3. Bruschi, D., Martignoni, L., Monga, M.: Detecting self-mutating malware using control-flow graph matching. In: Proceedings of Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA’06) (2006)

  4. Bruschi, D., Martignoni, L., Monga, M.: Using code normalization for fighting self-mutating malware. In: Proceedings of the International Symposium of Secure Software Engineering (2006)

  5. Cadar, C., Dunbar, D., Engler, D.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 2008 USENIX Symposium on Operating Systems Design and Implementation (OSDI’08) (2008)

  6. Cadar, C., Ganesh, V., Pawlowski, P.M., Dill, D.L., Engler, D.R.: EXE: automatically generating inputs of death. In: Proceedings of the 2006 ACM Conference on Computer and Communications Security (CCS’06) (2006)

  7. Christodorescu, M., Kinder, J., Jha, S., Katzenbeisser, S., Veith, H.: Malware normalization. Technical Report 1539, University of Wisconsin, Madison, Wisconsin, USA, November (2005)

  8. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. In: Chapter 21: Data structures for Disjoint Sets, pp. 498–524. MIT Press (2001)

  9. Dalla Preda, M., Giacobazzi, R., Lakhotia, A., Mastroeni, I.: Abstract symbolic automata: mixed syntactic/semantic similarity analysis of executables. In: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15) (2015)

  10. The Mental Driller: Metamorphism in practice or How I made MetaPHOR and what I’ve learnt. http://vxheaven.org/lib/vmd01.html. Last reviewed 04/14/2015

  11. Egele, M., Woo, M., Chapman, P., Brumley, D.: Blanket execution: dynamic similarity testing for program binaries and components. In: 23rd USENIX Security Symposium (USENIX Security’14) (2014)

  12. Flake, H.: Structural comparison of executable objects. In: Proceedings of the 2004 GI International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA’04) (2004)

  13. Ganesh, V., Dill, D.L.: A decision procedure for bit-vectors and arrays. In: Proceedings of the 2007 International Conference in Computer Aided Verification (CAV’07) (2007)

  14. Gao, D., Reiter, M.K., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Proceedings of the 10th International Conference on Information and Communications Security (ICICS’08) (2008)

  15. Jacob, M., Jakubowski, M.H., Naldurg, P., Saw, C.W., Venkatesan, R.: The superdiversifier: Peephole individualization for software protection. In: Proceedings of the 3rd International Workshop on Security (IWSEC’08) (2008)

  16. Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: Presented as part of the 22nd USENIX Security Symposium (USENIX Security’13) (2013)

  17. Kang, M.G., Poosankam, P., Yin, H.: Renovo: a hidden code extractor for packed executables. In: Proceedings of the 2007 ACM Workshop on Recurring Malcode (WORM ’07) (2007)

  18. Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the 10th ACM SIGKDD conference (KDD’04) (2004)

  19. Lakhotia, A., Preda, M.D., Giacobazzi, R.: Fast location of similar code fragments using semantic ‘juice’. In: Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop (PPREW’13) (2013)

  20. Lindorfer, M., Di Federico, A., Maggi, F., Comparetti, P.M., Zanero, S.: Lines of malicious code: insights into the malicious software industry. In: Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC’12) (2012)

  21. Liu, L., Ming, J., Wang, Z., Gao, D., Jia, C.: Denial-of-service attacks on host-based generic unpackers. In: Proceedings of the 11th International Conference on Information and Communications Security (ICICS’09) (2009)

  22. Luo, L., Ming, J., Wu, D., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14) (2014)

  23. Ming, J., Pan, M., Gao, D.: iBinHunt: Binary hunting with inter-procedural control flow. In: Proceedings of the 15th Annual International Conference on Information Security and Cryptology (ICISC’12) (2012)

  24. Ming, J., Xu, D., Wang, L., Wu, D.: Loop: logic-oriented opaque predicates detection in obfuscated binary code. In: Proceedings of the 22nd ACM Conference on Computer and Communications Security (CCS’15) (2015)

  25. Ng, B.H., Hu, X., Prakash, A.: A study on latent vulnerabilities. In: Proceedings of the 29th IEEE Symposium on Reliable Distributed Systems (SRDS’10) (2010)

  26. Ng, B.H., Prakash, A.: Exposé: discovering potential binary code re-use. In: Proceedings of the 37th IEEE Annual Computer Software and Applications Conference (COMPSAC’13) (2013)

  27. Oh, J.W.: DarunGrim: a patch analysis and binary diffing too. http://www.darungrim.org/. Last reviewed 10/26/2015

  28. Orr: The molecular virology of Lexotan32: metamorphism illustrated. http://www.openrce.org/articles/full_view/29. Last reviewed 04/14/2015

  29. Panda Security: Annual report 2013 summary. http://press.pandasecurity.com/wp-content/uploads/2010/05/PandaLabs-Annual-Report_2013.pdf. Last reviewed 10/25/2015

  30. Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: Proceedings of the 36th IEEE Symposium on Security and Privacy (S&P’15) (2015)

  31. Preda, M.D.: The grand challenge in metamorphic analysis. In: Proceedings of the 6th International Conference on Information Systems, Technology and Management (ICISTM12) (2012)

  32. Roundy, K.A., Miller, B.P.: Binary-code obfuscations in prevalent packer tools. ACM Comput. Surv. 46(1) (2013)

  33. Sikorski, M., Honig, A.: Practical malware analysis: the hands-on guide to dissecting malicious software. No Starch Press, February 2012

  34. Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: Bitblaze: a new approach to computer security via binary analysis. In: Proceedings of the 4th International Conference on Information Systems Security (ICISS’08) (2008)

  35. Sridhara, S.M., Stamp, M.: Metamorphic worm that carries its own morphing engine. Comput. Virol. 9(2), 49–58 (2013)

    Google Scholar 

  36. Wong, W., Stamp, M.: Hunting for metamorphic engines. Comput. Virol. 2(3), 211–229 (2006)

    Article  Google Scholar 

  37. Yang, G., Păsăreanu, C.S., Khurshid, S.: Memoized symbolic execution. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis (ISSTA’12) (2012)

  38. Yin, H., Song, D.: TEMU: binary code analysis via whole-system layered annotative execution. Technical Report UCB/EECS-2010-3, EECS Department, University of California, Berkeley, Jan 2010

Download references

Acknowledgments

This research was supported in part by the National Science Foundation (NSF) grants CNS-1223710 and CCF-1320605, and the Office of Naval Research (ONR) grant N00014-13-1-0175.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiang Ming.

Additional information

A preliminary version of this paper appeared in the Proceedings of the 30th IFIP TC-11 SEC International Information Security and Privacy Conference (IFIP SEC’15), Hamburg, Germany, May 26-28, 2015.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ming, J., Xu, D. & Wu, D. MalwareHunt: semantics-based malware diffing speedup by normalized basic block memoization. J Comput Virol Hack Tech 13, 167–178 (2017). https://doi.org/10.1007/s11416-016-0279-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-016-0279-x

Keywords

Navigation