An HMM and structural entropy based detector for Android malware: An empirical study
Introduction
With the growth of smartphones capabilities, malicious software targeting mobile devices is rapidly spreading, and it is getting more and more successful in evading the detection.
In 2013, the growth rate of mobile malware was far greater than the growth rate of new malware targeting PCs (Alcatel Lucent, 2013), for the first time in malware history.
New kinds of malware spread out continuously at a very fast pace, and malware writers refine both the evasion techniques and the techniques for obtaining tangible return from the attacks, in terms of money or damage to the victim (F-Secure, 2015). Unfortunately, current solutions to protect users from new threats are still inadequate (Fraunhofer AISEC, 2013, Visaggio, Mercaldo, 2015). For example, a malware that is plaguing a huge number of devices while the authors are writing this paper is the ransomware (InfoWorld, 2013), which encrypts data stored on the device and holds it for ransom. The information will be released only after the victim pays the required amount, often in bitcoin.
In addition to this, there exist several techniques to allow the mobile malware to evade signature detection (Ramachandran et al, 2012, Rastogi et al, 2013), which makes detection harder.
In the meantime, simple forms of polymorphic attacks targeting Android platform have been seen in the wild (Bayer et al., 2006): the main effect of polymorphism (and metamorphism) is that signature-based detection becomes ineffective.
That considered, it urges to develop new techniques to detect malware targeting mobile devices.
Recent papers (Attaluri et al, 2008, Baysa et al, 2013) have used the structural entropy to detect metamorphic virus and Hidden Markov Models (HMM) to classify them. We observed that the way Android malware evolves makes it similar to metamorphic malware, in certain regard. As a matter of fact, writers of malware for Android use to modify some existing malware, by adding new behaviors or merging together parts of different existing malware's codes. This explains also why Android malware is usually grouped in families: in fact, given this way of generating Android malware, the malware belonging to the same family shares common parts of code and behaviors.
Considered these similarities, and considered that Structural Entropy and HMM were able to successfully detect metamorphic viruses for personal computer (Attaluri et al, 2008, Baysa et al, 2013), we investigate with this paper whether these two techniques can be effective in recognizing Android malware and the malware families. The fact that these techniques are effective with personal computers' malware does not entail that they are effective also for Android malware.
As a matter of fact, Android presents program's structures and features that make an Android malware different from a PC malware, since these features are leveraged by malware writers for developing techniques of infection, evasion and payload activation that are not used in PC's malware. Examples are the dynamic loading that permits to dynamically add malicious code to an app, the intent based programming that allows techniques of attacks like service or activity hijacking (Chin et al., 2011), and the system of permissions that limits the range of actions a malware can do, but allows attacks like the update attack (Poeplau et al., 2014), which is a very effective and widespread anti-detection technique. These peculiarities of Android lead us to wonder whether structural entropy and HMM hold their effectiveness in detecting malware also when applied to malicious software written for Android. Moreover, at the best knowledge of the authors, only one paper explores the effectiveness of HMM for detecting Android malware (Chen et al., 2014), but the authors obtained lower performances, used a smaller dataset than the one we used in the experiments, and applied HMM on different features from the ones our method relies on.
The experiments we carried out to demonstrate that HMM is effective in recognizing malware, i.e. with a precision of 0.96, while the structural entropy successfully identifies the family a malware belongs to, with a precision of 0.98.
Identifying the family that a malware belongs to is of primary importance as it helps to discover new malware families (Khoo, Lio, 2011, Ma et al, 2006), creates models of provenance and lineage (Dumitras and Neamtiu, 2011), and generates phylogeny models (Karim et al., 2005).
The paper proceeds as follows: the next section provides background notions about HMM and structural entropy and discusses the related work; Section 3 discusses the adoption of HMM and structural entropy methods to detect mobile malware; Section 4 discusses the experimental evaluation; Section 5 illustrates the results of experiments; Section 6 discusses the detection performance of HMM and structural entropy methods; Section 7 explains the threats to validity and, finally, conclusions are drawn in Section 8.
Section snippets
Background and related work
Before discussing the state of the art of malware detection using HMM and structural entropy, we recall the essential background about HMM and structural entropy.
Adopting HMM and structural entropy malware detection for android
In order to use HMM and structural entropy to detect Android malware, their original application to the PC's metamorphic malware detection was modified, as described in this section.
Experimental evaluation: study definition
In this section we discuss the experiments we carried out to evaluate the effectiveness of the HMM and structural entropy in detecting Android malware and correctly classifying the family a malware belongs to.
Analysis of results
The hypothesis test produced evidence that the considered features have different distributions in the control and experimental sample, as shown in Table 3. As a matter of fact, all the p-values are under 0.001.
Summing up, the null hypothesis can be rejected for the features f1, f2, f3 and f4. According to the hypothesis tests, both the two methods, HMM and structural entropy are able to distinguish a malware from a trusted app.
With regard to classification, we define the training set T,
Performance evaluation
In this section we discuss the performances of the structural entropy and the HMM based detectors.
In order to measure performances of the two methods, we used the time.clock() Python function that returns the processor time. The processor time is the percentage of elapsed time that the processor spends to execute a non-idle thread, i.e. the cpu-time measured in seconds that the process requires to perform the computation.
The machine used to run the scripts and to take measurements was an Intel
Threats to validity
This section describes the threats that can affect the validity of our evaluation, known as: construct, internal reliability, and external validity.
Conclusion and future work
In this paper we propose a detector for malicious mobile applications consisting on a classifier which uses as features 3-4-5 states HMM and the structural entropy.
Current malware detection techniques are ineffective, as they usually fail against zero-day attacks, in addition to the fact that existing malware can easily evade the current detectors.
This happens because Android malware is increasingly becoming more and more complex, and it is acquiring characteristics that make it closer to
References (54)
The illustrated wavelet transform handbook: introductory theory and applications in science, engineering, medicine and finance
(2002)Kindsight security labs malware report – q4
- et al.
Heldroid: dissecting and detecting mobile ransomware
An assembler/disassembler for android's dex format
- et al.
Identifying unknown android malware with feature extractions and classification techniques
- et al.
Drebin: efficient and explainable detection of android malware in your pocket
- et al.
Profile Hidden Markov Models and metamorphic virus detection
J Comput Virol Hacking Tech
(2008) - et al.
TTanalyze: a tool for analyzing malware
- et al.
Structural entropy and metamorphic malware
J Comput Virol Hacking Tech
(2013) Fundamentals in information theory and coding
(2011)
Dissecting the android bouncer
A classifier of malicious android applications
Detecting android malware using sequences of system calls
Effectiveness of Opcode ngrams for detection of multi family android malware
MAST: triage for market-scale mobile malware analysis
A Hidden Markov Model detection of malicious android applications at runtime
Analyzing inter-application communication in android
Detecting machine-morphed malware variants via engine attribution
J Comput Virol Hacking Tech
Droidlegacy: automated familial classification of android malware
Experimental challenges in cyber security: a story of provenance and lineage for malware
Mobile threat report
Androsimilar: robust statistical feature signature for android malware detection
Apposcopy: semantics-based detection of android malware through static analysis
On the effectiveness of malware protection on android
Update: McAfee: cyber criminals using android malware and ransomware the most
Malware phylogeny generation using permutations of code
Unity in diversity: phylogenetic-inspired techniques for reverse engineering and detection of malware families
Cited by (76)
DroidRL: Feature selection for android malware detection with reinforcement learning
2023, Computers and SecurityPE Parser: A Python package for Portable Executable files processing[Formula presented]
2022, Software ImpactsGDroid: Android malware detection and classification with graph convolutional network
2021, Computers and SecurityAssociation rule-based malware classification using common subsequences of API calls
2021, Applied Soft ComputingMalicious application detection in android - A systematic literature review
2021, Computer Science Review