Elsevier

Computers & Security

Volume 61, August 2016, Pages 1-18
Computers & Security

An HMM and structural entropy based detector for Android malware: An empirical study

https://doi.org/10.1016/j.cose.2016.04.009Get rights and content

Abstract

Smartphones are becoming more and more popular and, as a consequence, malware writers are increasingly engaged to develop new threats and propagate them through official and third-party markets. In addition to the propagation vectors, malware is also evolving quickly the techniques adopted for infecting victims and hiding their malicious nature to antimalware scanning. From SMS Trojans to legitimate applications repacked with malicious payload, from AES encrypted root exploits to the dynamic loading of a payload retrieved from a remote server: malicious code is becoming more and more hard to detect.

In this paper we experimentally evaluate two techniques for detecting Android malware: the first one is based on Hidden Markov Model, while the second one exploits structural entropy. These two techniques have been successfully applied to detect PCs viruses in previous works, and only one work in literature analyzes the application of HMM to the detection of Android malware. We demonstrate that these methods, which reveal effective for PCs virus, are also successful for detecting and classifying mobile malware.

Our results are promising: we obtain a precision of 0.96 to discriminate a malware application, and a precision of 0.978 to identify the malware family.

Introduction

With the growth of smartphones capabilities, malicious software targeting mobile devices is rapidly spreading, and it is getting more and more successful in evading the detection.

In 2013, the growth rate of mobile malware was far greater than the growth rate of new malware targeting PCs (Alcatel Lucent, 2013), for the first time in malware history.

New kinds of malware spread out continuously at a very fast pace, and malware writers refine both the evasion techniques and the techniques for obtaining tangible return from the attacks, in terms of money or damage to the victim (F-Secure, 2015). Unfortunately, current solutions to protect users from new threats are still inadequate (Fraunhofer AISEC, 2013, Visaggio, Mercaldo, 2015). For example, a malware that is plaguing a huge number of devices while the authors are writing this paper is the ransomware (InfoWorld, 2013), which encrypts data stored on the device and holds it for ransom. The information will be released only after the victim pays the required amount, often in bitcoin.

In addition to this, there exist several techniques to allow the mobile malware to evade signature detection (Ramachandran et al, 2012, Rastogi et al, 2013), which makes detection harder.

In the meantime, simple forms of polymorphic attacks targeting Android platform have been seen in the wild (Bayer et al., 2006): the main effect of polymorphism (and metamorphism) is that signature-based detection becomes ineffective.

That considered, it urges to develop new techniques to detect malware targeting mobile devices.

Recent papers (Attaluri et al, 2008, Baysa et al, 2013) have used the structural entropy to detect metamorphic virus and Hidden Markov Models (HMM) to classify them. We observed that the way Android malware evolves makes it similar to metamorphic malware, in certain regard. As a matter of fact, writers of malware for Android use to modify some existing malware, by adding new behaviors or merging together parts of different existing malware's codes. This explains also why Android malware is usually grouped in families: in fact, given this way of generating Android malware, the malware belonging to the same family shares common parts of code and behaviors.

Considered these similarities, and considered that Structural Entropy and HMM were able to successfully detect metamorphic viruses for personal computer (Attaluri et al, 2008, Baysa et al, 2013), we investigate with this paper whether these two techniques can be effective in recognizing Android malware and the malware families. The fact that these techniques are effective with personal computers' malware does not entail that they are effective also for Android malware.

As a matter of fact, Android presents program's structures and features that make an Android malware different from a PC malware, since these features are leveraged by malware writers for developing techniques of infection, evasion and payload activation that are not used in PC's malware. Examples are the dynamic loading that permits to dynamically add malicious code to an app, the intent based programming that allows techniques of attacks like service or activity hijacking (Chin et al., 2011), and the system of permissions that limits the range of actions a malware can do, but allows attacks like the update attack (Poeplau et al., 2014), which is a very effective and widespread anti-detection technique. These peculiarities of Android lead us to wonder whether structural entropy and HMM hold their effectiveness in detecting malware also when applied to malicious software written for Android. Moreover, at the best knowledge of the authors, only one paper explores the effectiveness of HMM for detecting Android malware (Chen et al., 2014), but the authors obtained lower performances, used a smaller dataset than the one we used in the experiments, and applied HMM on different features from the ones our method relies on.

The experiments we carried out to demonstrate that HMM is effective in recognizing malware, i.e. with a precision of 0.96, while the structural entropy successfully identifies the family a malware belongs to, with a precision of 0.98.

Identifying the family that a malware belongs to is of primary importance as it helps to discover new malware families (Khoo, Lio, 2011, Ma et al, 2006), creates models of provenance and lineage (Dumitras and Neamtiu, 2011), and generates phylogeny models (Karim et al., 2005).

The paper proceeds as follows: the next section provides background notions about HMM and structural entropy and discusses the related work; Section 3 discusses the adoption of HMM and structural entropy methods to detect mobile malware; Section 4 discusses the experimental evaluation; Section 5 illustrates the results of experiments; Section 6 discusses the detection performance of HMM and structural entropy methods; Section 7 explains the threats to validity and, finally, conclusions are drawn in Section 8.

Section snippets

Background and related work

Before discussing the state of the art of malware detection using HMM and structural entropy, we recall the essential background about HMM and structural entropy.

Adopting HMM and structural entropy malware detection for android

In order to use HMM and structural entropy to detect Android malware, their original application to the PC's metamorphic malware detection was modified, as described in this section.

Experimental evaluation: study definition

In this section we discuss the experiments we carried out to evaluate the effectiveness of the HMM and structural entropy in detecting Android malware and correctly classifying the family a malware belongs to.

Analysis of results

The hypothesis test produced evidence that the considered features have different distributions in the control and experimental sample, as shown in Table 3. As a matter of fact, all the p-values are under 0.001.

Summing up, the null hypothesis can be rejected for the features f1, f2, f3 and f4. According to the hypothesis tests, both the two methods, HMM and structural entropy are able to distinguish a malware from a trusted app.

With regard to classification, we define the training set T,

Performance evaluation

In this section we discuss the performances of the structural entropy and the HMM based detectors.

In order to measure performances of the two methods, we used the time.clock() Python function that returns the processor time. The processor time is the percentage of elapsed time that the processor spends to execute a non-idle thread, i.e. the cpu-time measured in seconds that the process requires to perform the computation.

The machine used to run the scripts and to take measurements was an Intel

Threats to validity

This section describes the threats that can affect the validity of our evaluation, known as: construct, internal reliability, and external validity.

Conclusion and future work

In this paper we propose a detector for malicious mobile applications consisting on a classifier which uses as features 3-4-5 states HMM and the structural entropy.

Current malware detection techniques are ineffective, as they usually fail against zero-day attacks, in addition to the fact that existing malware can easily evade the current detectors.

This happens because Android malware is increasingly becoming more and more complex, and it is acquiring characteristics that make it closer to

References (54)

  • P.S. Addision

    The illustrated wavelet transform handbook: introductory theory and applications in science, engineering, medicine and finance

    (2002)
  • Alcatel Lucent

    Kindsight security labs malware report – q4

  • N. Andronio et al.

    Heldroid: dissecting and detecting mobile ransomware

  • Anon.

    An assembler/disassembler for android's dex format

  • L. Apvrille et al.

    Identifying unknown android malware with feature extractions and classification techniques

  • D. Arp et al.

    Drebin: efficient and explainable detection of android malware in your pocket

  • S. Attaluri et al.

    Profile Hidden Markov Models and metamorphic virus detection

    J Comput Virol Hacking Tech

    (2008)
  • U. Bayer et al.

    TTanalyze: a tool for analyzing malware

  • D. Baysa et al.

    Structural entropy and metamorphic malware

    J Comput Virol Hacking Tech

    (2013)
  • M. Borda

    Fundamentals in information theory and coding

    (2011)
  • Busticati Productions Presents

    Dissecting the android bouncer

  • G. Canfora et al.

    A classifier of malicious android applications

  • G. Canfora et al.

    Detecting android malware using sequences of system calls

  • G. Canfora et al.

    Effectiveness of Opcode ngrams for detection of multi family android malware

  • S. Chakradeo et al.

    MAST: triage for market-scale mobile malware analysis

  • ChenY. et al.

    A Hidden Markov Model detection of malicious android applications at runtime

  • E. Chin et al.

    Analyzing inter-application communication in android

  • R. Chouchane et al.

    Detecting machine-morphed malware variants via engine attribution

    J Comput Virol Hacking Tech

    (2013)
  • L. Deshotels et al.

    Droidlegacy: automated familial classification of android malware

  • T. Dumitras et al.

    Experimental challenges in cyber security: a story of provenance and lineage for malware

    (2011)
  • F-Secure

    Mobile threat report

  • P. Faruki et al.

    Androsimilar: robust statistical feature signature for android malware detection

  • FengY. et al.

    Apposcopy: semantics-based detection of android malware through static analysis

  • Fraunhofer AISEC

    On the effectiveness of malware protection on android

  • InfoWorld

    Update: McAfee: cyber criminals using android malware and ransomware the most

  • M.E. Karim et al.

    Malware phylogeny generation using permutations of code

    (2005)
  • W. Khoo et al.

    Unity in diversity: phylogenetic-inspired techniques for reverse engineering and detection of malware families

  • Cited by (76)

    View all citing articles on Scopus
    View full text