Elsevier

Computers & Security

Volume 48, February 2015, Pages 212-233
Computers & Security

A framework for metamorphic malware analysis and real-time detection

https://doi.org/10.1016/j.cose.2014.10.011Get rights and content

Abstract

Metamorphism is a technique that mutates the binary code using different obfuscations. It is difficult to write a new metamorphic malware and in general malware writers reuse old malware. To hide detection the malware writers change the obfuscations (syntax) more than the behavior (semantic) of such a new malware. On this assumption and motivation, this paper presents a new framework named MARD for Metamorphic Malware Analysis and Real-Time Detection. As part of the new framework, to build a behavioral signature and detect metamorphic malware in real-time, we propose two novel techniques, named ACFG (Annotated Control Flow Graph) and SWOD-CFWeight (Sliding Window of Difference and Control Flow Weight). Unlike other techniques, ACFG provides a faster matching of CFGs, without compromising detection accuracy; it can handle malware with smaller CFGs, and contains more information and hence provides more accuracy than a CFG. SWOD-CFWeight mitigates and addresses key issues in current techniques, related to the change of the frequencies of opcodes, such as the use of different compilers, compiler optimizations, operating systems and obfuscations. The size of SWOD can change, which gives anti-malware tool developers the ability to select appropriate parameter values to further optimize malware detection. CFWeight captures the control flow semantics of a program to an extent that helps detect metamorphic malware in real-time. Experimental evaluation of the two proposed techniques, using an existing dataset, achieved detection rates in the range 94%–99.6%. Compared to ACFG, SWOD-CFWeight significantly improves the detection time, and is suitable to be used where the time for malware detection is more important as in real-time (practical) anti-malware applications.

Section snippets

Introduction and motivation

End point security is often the last defense against a security threat. An end point can be a desktop, a server, a laptop, a kiosk or a mobile device that connects to a network (Internet). Recent statistics by the International Telecommunications Union (ITU, 2013) show that the number of Internet users (i.e: people connecting to the Internet using these end points) in the world have increased from 20% in 2006 to 40% (almost 2.7 billion in total) in 2013. A study carried out by Symantec about

Related works

This Section discusses the previous research efforts for detecting malware. We cover only recent academic research efforts that claim to detect or intend to extend their approaches to detect metamorphic malware. We divide these research efforts into three groups based on the type of analysis performed for malware detection: control flow analysis, information flow analysis, and opcode-based analysis.

Framework overview

Fig. 1 gives an overview of our proposed framework for Metamorphic Malware Analysis and Real-Time Detection (MARD). First a training dataset is built, also called Malware Templates in Fig. 1, using the malware training samples. After a program (sample) is translated to MAIL and to a behavioral signature (generated using one of the two proposed techniques described in Sections 4 ACFG detection technique, 5 SWOD-CFWeight detection technique) the Similarity Detector (Fig. 1) detects the presence

ACFG detection technique

Current techniques (Eskandari and Hashemi, 2012a, Eskandari and Hashemi, 2012b, Anju et al., 2010, Song and Touili, 2012a, Vinod et al., 2009, Bruschi et al., 2006, Cesare and Xiang, 2011, Guo et al., 2010, Kirda et al., 2006, Flake, 2004) that use CFG for malware detection are either compute intensive or have poor detection rates and cannot handle malware with smaller CFGs. We propose, in this paper, a new technique named Annotated Control Flow Graph (ACFG) that can enhance the detection of

Rationale and overview

Techniques based on behavior analysis (Faruki et al., 2012, Yin and Song, 2013, Song and Touili, 2012a, Vinod et al., 2012, Ghiasi et al., 2012, Bruschi et al., 2006, Guo et al., 2010, Kirda et al., 2006, Flake, 2004) are used to detect metamorphic malware, but are compute intensive and are not suitable for real-time detection. A subset of other techniques (Runwal et al., 2012, Toderici and Stamp, 2013, Rad et al., 2012, Santos et al., 2013, Shabtai et al., 2012, Wong and Stamp, 2006, Austin

Evaluation, analysis and comparison

In this Section we evaluate the correctness and the efficiency of our proposed techniques, and present results, discussions and analysis of this evaluation. We also compare the proposed framework with other such malware detection systems.

Conclusion and future work

In this paper, we have presented a new metamorphic malware detection framework named MARD that implements the two novel techniques proposed in this paper named ACFG and SWOD-CFWeight, and shown through experimental evaluation its effectiveness for metamorphic malware analysis and real-time detection. We have also compared MARD with other such detection systems. MARD with the proposed techniques, clearly shows results in the top, and unlike others is fully automatic, supports malware detection

Shahid Alam is currently a PhD student in the Computer Science Department at University of Victoria, BC. He received his MASc degree from Carleton University, Ottawa, ON, in 2007. He has more than 5 years of experience working in the software industry. His research interests include programming languages, compilers, software engineering and binary analysis for software security. Currently he is looking into applying compiler, binary analysis and artificial intelligence techniques to automate

References (56)

  • M. Eskandari et al.

    A graph mining approach for detecting unknown malwares

    J Vis Lang Comput

    (Jun. 2012)
  • I. Santos et al.

    Opcode sequences as representation of executables for data-mining-based unknown malware detection

    Inf Sci

    (2013)
  • A.V. Aho et al.

    Compilers: principles, techniques, and tools

    (2006)
  • S. Alam

    Examples of CFGs before and after shrinking

    (2013)
  • S. Alam

    MAIL: malware analysis intermediate language

    (2013)
  • S. Alam et al.

    MAIL: malware analysis intermediate language – a step towards automating and optimizing malware detection

  • S. Alam et al.

    MARD: a framework for metamorphic malware analysis and real-time detection

  • S.S. Anju et al.

    Malware detection using assembly code and control flow graph optimization

  • T.H. Austin et al.

    Exploring hidden markov models for virus analysis: a semantic approach

  • G. Balakrishnan et al.

    WYSINWYX: What You See Is Not What You eXecute

    (2005)
  • D. Baysa et al.

    Structural entropy and metamorphic malware

    J Comput Virol Hacking Tech

    (2013)
  • J.-M. Borello et al.

    Code obfuscation techniques for metamorphic viruses

    J Comput Virol

    (2008)
  • D. Bruschi et al.

    Detecting self-mutating malware using control-flow graph matching

  • G. Canfora et al.

    Static analysis for the detection of metamorphic computer viruses using repeated-instructions counting heuristics

    J Comput Virol Hacking Tech

    (2014)
  • S. Cesare et al.

    Malware variant detection using similarity search over sets of control flow graphs

  • C. Collberg et al.

    Manufacturing cheap, resilient, and stealthy opaque constructs

  • T.H. Cormen et al.

    Introduction to algorithms

    (2009)
  • S. Deshpande et al.

    Eigenvalue analysis for metamorphic detection

    J Comput Virol Hacking Tech

    (2014)
  • M. Eskandari et al.

    ECFGM: enriched control flow graph miner for unknown vicious infected code detection

    J Comput Virol

    (Aug. 2012)
  • P. Faruki et al.

    Mining control flow graph as API call-grams to detect portable executable malware

  • E. Filiol et al.

    A statistical model for Undecidable Viral detection

    J Comput Virol

    (2007)
  • H. Flake

    Structural comparison of executable objects

  • G2. Second generation virus generator, http://vxheaven.org/vx.php?id=tg00 [last accessed...
  • M.R. Garey et al.

    Computers and intractability; a guide to the theory of NP-completeness

    (1990)
  • M. Ghiasi et al.

    Dynamic malware detection using registers values set analysis

  • J.L. Gross et al.

    Graph theory and its applications

    (2005)
  • H. Guo et al.

    Hero: a novel malware detection framework based on binary translation

  • ITU

    The world in 2013: ICT Facts and figures

    (2013)
  • Cited by (60)

    • Mal-Detect: An intelligent visualization approach for malware detection

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      Static analysis examines a malicious program in a controlled environment without executing it, whereas dynamic analysis monitors it under inspection. The process for analysing a given malware usually involves unpacking, disassembling and extracting important features (Alam et al., 2015; Ding et al., 2018; Kang et al., 2016). The static analysis, which is one of the most widely used malware detection techniques is faced with low detection rates, packing, code obfuscation, and is easily evaded by code modifications.

    • Structural features with nonnegative matrix factorization for metamorphic malware detection

      2021, Computers and Security
      Citation Excerpt :

      Other studies (Hassen and Chan, 2017; Kakisim et al., 2020; Khalilian et al., 2018; Rajeswaran et al., 2018) extracted the opcodes and transformed them into graph-based analysis for metamorphic malware detection. However, feature processing overhead (Baysa et al., 2013) and compiler optimization (Alam et al., 2015) are some of the challenges that needed to be addressed when generating opcode features for these approaches. Byte statistical analysis using information theoretic measures has emerged as alternative way to characterize file structure on raw bytes as features for malware detection.

    View all citing articles on Scopus

    Shahid Alam is currently a PhD student in the Computer Science Department at University of Victoria, BC. He received his MASc degree from Carleton University, Ottawa, ON, in 2007. He has more than 5 years of experience working in the software industry. His research interests include programming languages, compilers, software engineering and binary analysis for software security. Currently he is looking into applying compiler, binary analysis and artificial intelligence techniques to automate and optimize malware analysis and detection.

    Nigel Horspool is a Professor of computer science at the University of Victoria. He received an M.Sc degree and a Ph.D. in Computer Science from the University of Toronto in 1972 and 1976, respectively. From 1976 until 1983, he was an Assistant Professor and then an Associate Professor in the School of Computer Science at McGill University in Montreal. He joined the Computer Science Department at the University of Victoria in 1983. His research interests are mostly concerned with the compilation and implementation of programming languages. He is the author of the book C Programming in the Berkeley UNIX Environment and co-author of the book C# Concisely.

    Issa Traore has been with the faculty of the Electrical and Computer Engineering Department of the University of Victoria since 1999, where he is currently a Professor. Dr. Traore is also the founder and Director of the Information Security and Object Technology (ISOT) Lab (www.isot.ece.uvic.ca). He obtained in 1998 a PhD in Software Engineering from the Institute Nationale Polytechnique of Toulouse, France. His main research interests are biometrics technologies, intrusion detection systems, and software security.

    Ibrahim Sogukpinar received his B.Sc. degree in Electronic and Communications Engineering from Technical University of Istanbul in 1982, and his M.Sc. degree in Computer and Control Engineering from Technical University of Istanbul in 1985. He received his Ph.D. degree in Computer and Control Engineering from Technical University of Istanbul in 1995. Currently he is the head of the Computer Engineering Department at Gebze Institute of Technology. His main research areas are information security, computer networks, applications of information systems and computer vision.

    View full text