MalwareHunt: semantics-based malware diffing speedup by normalized basic block memoization

Ming, Jiang; Xu, Dongpeng; Wu, Dinghao

doi:10.1007/s11416-016-0279-x

MalwareHunt: semantics-based malware diffing speedup by normalized basic block memoization

Original Paper
Published: 17 May 2016

Volume 13, pages 167–178, (2017)
Cite this article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Jiang Ming¹,
Dongpeng Xu¹ &
Dinghao Wu¹

409 Accesses
5 Citations
Explore all metrics

Abstract

The key challenge of software reverse engineering is that the source code of the program under investigation is typically not available. Identifying differences between two executable binaries (binary diffing) can reveal valuable information in the absence of source code, such as vulnerability patches, software plagiarism evidence, and malware variant relations. Recently, a new binary diffing method based on symbolic execution and constraint solving has been proposed to look for the code pairs with the same semantics, even though they are ostensibly different in syntactics. Such semantics-based method captures intrinsic differences/similarities of binary code, making it a compelling choice to analyze highly-obfuscated malicious programs. However, due to the nature of symbolic execution, semantics-based binary diffing suffers from significant performance slowdown, hindering it from analyzing large numbers of malware samples. In this paper, we attempt to mitigate the high overhead of semantics-based binary diffing with application to malware lineage inference. We first study the key obstacles that contribute to the performance bottleneck. Then we propose normalized basic block memoization to speed up semantics-based binary diffing. We introduce an union-find set structure that records semantically equivalent basic blocks. Managing the union-find structure during successive comparisons allows direct reuse of previously computed results. Moreover, we utilize a set of enhanced optimization methods to further cut down the invocation numbers of constraint solver. We have implemented our technique, called MalwareHunt, on top of a trace-oriented binary diffing tool and evaluated it on 15 polymorphic and metamorphic malware families. We perform intra-family comparisons for the purpose of malware lineage inference. Our experimental results show that MalwareHuntcan accelerate symbolic execution from 2.8X to 5.3X (with an average 4.1X), and reduce constraint solver invocation by a factor of 3.0X to 6.0X (with an average 4.5X).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How different are different diff algorithms in Git?

Article Open access 11 September 2019

A survey on run-time packers and mitigation techniques

Article 01 November 2023

Automatic software refactoring: a systematic literature review

Article 03 December 2019

Notes

References

Bourquin, M., King, A., Robbins, E.: Binslayer: Accurate comparison of binary executables. In: Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop (PPREW ’13) (2013)
Brumley, D., Poosankam, P., Song, D., Zheng, J.: Automatic patch-based exploit generation is possible: techniques and implications. In: Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP’08) (2008)
Bruschi, D., Martignoni, L., Monga, M.: Detecting self-mutating malware using control-flow graph matching. In: Proceedings of Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA’06) (2006)
Bruschi, D., Martignoni, L., Monga, M.: Using code normalization for fighting self-mutating malware. In: Proceedings of the International Symposium of Secure Software Engineering (2006)
Cadar, C., Dunbar, D., Engler, D.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 2008 USENIX Symposium on Operating Systems Design and Implementation (OSDI’08) (2008)
Cadar, C., Ganesh, V., Pawlowski, P.M., Dill, D.L., Engler, D.R.: EXE: automatically generating inputs of death. In: Proceedings of the 2006 ACM Conference on Computer and Communications Security (CCS’06) (2006)
Christodorescu, M., Kinder, J., Jha, S., Katzenbeisser, S., Veith, H.: Malware normalization. Technical Report 1539, University of Wisconsin, Madison, Wisconsin, USA, November (2005)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. In: Chapter 21: Data structures for Disjoint Sets, pp. 498–524. MIT Press (2001)
Dalla Preda, M., Giacobazzi, R., Lakhotia, A., Mastroeni, I.: Abstract symbolic automata: mixed syntactic/semantic similarity analysis of executables. In: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15) (2015)
The Mental Driller: Metamorphism in practice or How I made MetaPHOR and what I’ve learnt. http://vxheaven.org/lib/vmd01.html. Last reviewed 04/14/2015
Egele, M., Woo, M., Chapman, P., Brumley, D.: Blanket execution: dynamic similarity testing for program binaries and components. In: 23rd USENIX Security Symposium (USENIX Security’14) (2014)
Flake, H.: Structural comparison of executable objects. In: Proceedings of the 2004 GI International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA’04) (2004)
Ganesh, V., Dill, D.L.: A decision procedure for bit-vectors and arrays. In: Proceedings of the 2007 International Conference in Computer Aided Verification (CAV’07) (2007)
Gao, D., Reiter, M.K., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Proceedings of the 10th International Conference on Information and Communications Security (ICICS’08) (2008)
Jacob, M., Jakubowski, M.H., Naldurg, P., Saw, C.W., Venkatesan, R.: The superdiversifier: Peephole individualization for software protection. In: Proceedings of the 3rd International Workshop on Security (IWSEC’08) (2008)
Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: Presented as part of the 22nd USENIX Security Symposium (USENIX Security’13) (2013)
Kang, M.G., Poosankam, P., Yin, H.: Renovo: a hidden code extractor for packed executables. In: Proceedings of the 2007 ACM Workshop on Recurring Malcode (WORM ’07) (2007)
Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the 10th ACM SIGKDD conference (KDD’04) (2004)
Lakhotia, A., Preda, M.D., Giacobazzi, R.: Fast location of similar code fragments using semantic ‘juice’. In: Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop (PPREW’13) (2013)
Lindorfer, M., Di Federico, A., Maggi, F., Comparetti, P.M., Zanero, S.: Lines of malicious code: insights into the malicious software industry. In: Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC’12) (2012)
Liu, L., Ming, J., Wang, Z., Gao, D., Jia, C.: Denial-of-service attacks on host-based generic unpackers. In: Proceedings of the 11th International Conference on Information and Communications Security (ICICS’09) (2009)
Luo, L., Ming, J., Wu, D., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’14) (2014)
Ming, J., Pan, M., Gao, D.: iBinHunt: Binary hunting with inter-procedural control flow. In: Proceedings of the 15th Annual International Conference on Information Security and Cryptology (ICISC’12) (2012)
Ming, J., Xu, D., Wang, L., Wu, D.: Loop: logic-oriented opaque predicates detection in obfuscated binary code. In: Proceedings of the 22nd ACM Conference on Computer and Communications Security (CCS’15) (2015)
Ng, B.H., Hu, X., Prakash, A.: A study on latent vulnerabilities. In: Proceedings of the 29th IEEE Symposium on Reliable Distributed Systems (SRDS’10) (2010)
Ng, B.H., Prakash, A.: Exposé: discovering potential binary code re-use. In: Proceedings of the 37th IEEE Annual Computer Software and Applications Conference (COMPSAC’13) (2013)
Oh, J.W.: DarunGrim: a patch analysis and binary diffing too. http://www.darungrim.org/. Last reviewed 10/26/2015
Orr: The molecular virology of Lexotan32: metamorphism illustrated. http://www.openrce.org/articles/full_view/29. Last reviewed 04/14/2015
Panda Security: Annual report 2013 summary. http://press.pandasecurity.com/wp-content/uploads/2010/05/PandaLabs-Annual-Report_2013.pdf. Last reviewed 10/25/2015
Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: Proceedings of the 36th IEEE Symposium on Security and Privacy (S&P’15) (2015)
Preda, M.D.: The grand challenge in metamorphic analysis. In: Proceedings of the 6th International Conference on Information Systems, Technology and Management (ICISTM12) (2012)
Roundy, K.A., Miller, B.P.: Binary-code obfuscations in prevalent packer tools. ACM Comput. Surv. 46(1) (2013)
Sikorski, M., Honig, A.: Practical malware analysis: the hands-on guide to dissecting malicious software. No Starch Press, February 2012
Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: Bitblaze: a new approach to computer security via binary analysis. In: Proceedings of the 4th International Conference on Information Systems Security (ICISS’08) (2008)
Sridhara, S.M., Stamp, M.: Metamorphic worm that carries its own morphing engine. Comput. Virol. 9(2), 49–58 (2013)
Google Scholar
Wong, W., Stamp, M.: Hunting for metamorphic engines. Comput. Virol. 2(3), 211–229 (2006)
Article Google Scholar
Yang, G., Păsăreanu, C.S., Khurshid, S.: Memoized symbolic execution. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis (ISSTA’12) (2012)
Yin, H., Song, D.: TEMU: binary code analysis via whole-system layered annotative execution. Technical Report UCB/EECS-2010-3, EECS Department, University of California, Berkeley, Jan 2010

Download references

Acknowledgments

This research was supported in part by the National Science Foundation (NSF) grants CNS-1223710 and CCF-1320605, and the Office of Naval Research (ONR) grant N00014-13-1-0175.

Author information

Authors and Affiliations

The Pennsylvania State University, University Park, PA, 16802, USA
Jiang Ming, Dongpeng Xu & Dinghao Wu

Authors

Jiang Ming
View author publications
You can also search for this author in PubMed Google Scholar
Dongpeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Dinghao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiang Ming.

Additional information

A preliminary version of this paper appeared in the Proceedings of the 30th IFIP TC-11 SEC International Information Security and Privacy Conference (IFIP SEC’15), Hamburg, Germany, May 26-28, 2015.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ming, J., Xu, D. & Wu, D. MalwareHunt: semantics-based malware diffing speedup by normalized basic block memoization. J Comput Virol Hack Tech 13, 167–178 (2017). https://doi.org/10.1007/s11416-016-0279-x

Download citation

Received: 29 October 2015
Accepted: 04 May 2016
Published: 17 May 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11416-016-0279-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MalwareHunt: semantics-based malware diffing speedup by normalized basic block memoization

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

A survey on run-time packers and mitigation techniques

Automatic software refactoring: a systematic literature review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MalwareHunt: semantics-based malware diffing speedup by normalized basic block memoization

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

A survey on run-time packers and mitigation techniques

Automatic software refactoring: a systematic literature review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation