Machine learning-assisted signature and heuristic-based detection of malwares in Android devices

https://doi.org/10.1016/j.compeleceng.2017.11.028Get rights and content

Abstract

Malware detection is an important factor in the security of the smart devices. However, currently utilized signature-based methods cannot provide accurate detection of zero-day attacks and polymorphic viruses. In this context, an efficient hybrid framework is presented for detection of malware in Android Apps. The proposed framework considers both signature and heuristic-based analysis for Android Apps. We have reverse engineered the Android Apps to extract manifest files, and binaries, and employed state-of-the-art machine learning algorithms to efficiently detect malwares. For this purpose, a rigorous set of experiments are performed using various classifiers such as SVM, Decision Tree, W-J48 and KNN. It has been observed that SVM in case of binaries and KNN in case of manifest.xml files are the most suitable options in robustly detecting the malware in Android devices. The proposed framework is tested on benchmark datasets and results show improved accuracy in malware detection.

Introduction

In recent years, smart devices have become main source of communication. Mobile phone has entered in market with a regular handset and now it has changed in to Smartphone with significant improvements in technology. Numbers of Smartphone users are greater than ever because of its available facilities. In past, mobile phone was only used to make phone calls and SMS messages. Recent situation has been changed and mobile phones are being used as camera, music player, Tablet PC, web browser etc. Today's mobile phones are equipped with multiple sensors with enhanced memory and processing power, thus enabling them to be used as personal computer.

A Smartphone requires applications and Operating System (OS) to facilitate users. Different operating systems are available for Smartphone's like iOS, Windows, Blackberry OS and Android. Android is the most famous among these platforms. Every day, approximately 1.3 million Android devices are being activated according to Google Chairmen Erich Schmidt [1]. Android provides their users a rich media support, optimized graphic system and powerful browser. Apart from this, Android OS also provides support for 24 h GPS tracking, video camera, compass and 3D-accelerometer. It yields rich Application Program Interfaces (APIs) for location and map functions. Users can easily control or process Google map on Android devices and access location at low cost. Due to the high usage of Smartphone's, every individual user is exposed to the threat of unwanted and malicious applications. Malware authors are busy in writing malicious applications with an increase in the number of Android users. The recent research illustrated that Android Apps are repacked by malicious ELF binaries for hiding calls to external binaries. Similarly, researchers are trying to find out the best malware detection methods like memory forensic technique and secure data communication methods that can prevent Android devices.

Whenever a user wants to install an application from play store, Application is downloaded first and then asked for installation after accepting all permissions. User can't install an application without accepting all permissions required by developer (hacker). Hackers usually ask for permission through which they can access user's camera, audio, text messages and all other private information. Users are uninformed of this purpose of hackers and they accept all permissions to install application. In this way, they become victim of hacker's attack. Moreover, hackers can make changes in constant strings to attack on mobile devices, as explained in feature extraction part in Section 3.

Several techniques for detecting malware have been proposed in literature which can be divided into two broad classes: Static and Dynamic Analysis-based methods. Dynamic Analysis also known as behavioural-based analysis collects information from the OS at runtime such as system calls, network access and files and memory modifications. Hybrid apps consist of both native apps, and web apps. Like native apps, they live in an app store and can take advantage of the many device features available. Like web apps, they rely on HTML being rendered in a browser, with the caveat that the browser is embedded within the app.

Often, companies build hybrid apps as wrappers for an existing web page. In that way, they hope to get a presence in the app store without spending significant effort for developing a different app. Hybrid apps are also popular because they allow cross platform development. Thus significantly reduce development costs: that is, the same HTML code components can be reused on different mobile operating systems. Tools such as PhoneGap and Sencha Touch allow people to design and code across platforms, using the power of HTML. However, developers rush to exploit off the shelf libraries in hybrid apps. Great new features are freely available without fully understanding, addressing, the security implications, increasing the chances of malware penetration in mobile devices.

In Static Analysis (signature-based analysis), information about the App and its expected behaviour consists of explicit and implicit observations in its binary/source code. Static Analysis methods are fast and effective, but various techniques can be used to dodge Static Analysis and thus render their ability to cope with polymorphic malware. There are number of signature and behaviour-based detection tools available on play store for detection of malicious Android applications. Recent study has shown that signature based malware detection tools works till a certain level. They become ineffective when malware authors make changes in apps. Such type of signature-based tools and anti-viruses could not provide protection to Android users.

Since Android is an open source and extensible platform, it allows to extract as many features as we would like. This enables to provide richer detection capabilities, not relying merely on the standard call records or power consumption patterns. The proposed method is novel in the context that it evaluates the ability to detect malicious activity on an Android device by employing Machine Learning algorithms using a variety of monitored features like permissions, providers, intent filters, process name and constant strings extracted from Android Apps. The proposed malware detection technique can also be used on diverse environments like BlackBerry, iOS etc. This work aims to find solution for following challenges:

  • How to develop a malware detection system that can adapt to any kind of malware?

  • How to detect malware before actual installation?

  • How to scrutinize hybrid mobile apps for possible malware threat?

  • How to warn Android users about malware after a download?

  • Why extracting combined features of Android Apps is better way to detect malware than signature based and behaviour-based techniques?

Selection of good features from Android applications and their combination can lead to a robust malware detection system. Most of the malware detection techniques with dynamic analysis detect malware after installation of an Android App which can affect devices. To install an application, user has to allow all malicious permissions. It is not a secure way that a malware detection technique identifies malware after a device has been affected. We performed static analysis in the proposed malware detection technique to detect malware after downloading application. In this way, security of Smartphone does not compromise. When a user downloads an App from play store and identified malicious by the proposed malware detection technique, Apps will be prompted by detection system and user will be informed about malicious app before installation.

Contributions of this work are:

  • A generic malware detection framework that is adaptive to different types of malware.

  • Real time detection of malware using combination of string analysis and permissions.

  • Performance evaluation methods to calculate precision and accuracy of malware detection in Android Apps.

As we have used hybrid approach that is both static and dynamic. Static approach will overcome the drawbacks of dynamic approach, while dynamic approach will cover the deficiencies of dynamic approach. So, all kind of malwares can be detected using the proposed approach, because the hybrid approach is used in our model. In this way, our proposed generic malware detection approach can detect different types of malware.

Before installing application, our proposed method will compare constant strings of downloaded Android Package Kit (APK) with constant strings of malware applications. The APK contains all components of an Android application and used for distribution and installation of mobile applications. If constant strings of malware application and downloaded APK do not match, user will be informed that application is malicious. Secondly, our extracted malicious keywords will be compared with manifest.xml file of application to check weather application is malware or legitimate. All these processes are executed before installation of APK file. In this manner, the proposed system can detect malware before installing an application.

The remainder of this paper is organized as follows. Section 2 summarizes related work, and Section 3 introduces the proposed framework. Section 4 describes the evaluation scheme and experimental setup, finally Section 5 concludes the paper.

Section snippets

Literature review

Researchers have presented various methods for the detection of malware in Android OS. In this section, an overview of the existing malware detection frameworks is provided.

Methodology

To overcome the problems of static and dynamic malware detection techniques proposed in literature and summarized in Section II, this section describes the details of a new hybrid malware detection model for Android applications. The proposed model uses both static and dynamic analysis for malware detection. Malwares that can't be detected in static approach, dynamic approach will detect them and the malwares that can't be detected in dynamic approach, will be detected by the static approach.

Experimental results, and discussion

The proposed framework is tested on small data sets, M0DROID [25]. This dataset contains different type of android applications like: games, selfie camera, torch apps, weather apps, Map apps, music, health and many others. We tested our framework with all existing types of malware families; some of them are Plankton, DroidKungFu, GinMaster, FakeInstaller, Opfake, BaseBridge, Nisev, Adrd, Kmin, Geinimi, DroidDream, Imlog, Nandrobox, SmForw, Plankton, FakeRun and many more. Our hybrid approach

Conclusion

In last two decades various malware detection approaches with static and dynamic analysis have been developed, however both have some pros and cons. In this work, a hybrid approach using both static and dynamic analysis methods is presented that improved the overall accuracy of malware detection, while supplementing the drawbacks of static and dynamic analysis methods. Two types of features are extracted from various android Apps. These features are broadly classified into three categories:

Acknowledgments

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2016-0-00312) supervised by the IITP (Institute for Information & communications Technology Promotion).

Competing interests

The authors declare that they have no competing and financial interests.

Zahoor-ur-Rehman has experience both in academia and research. He has received his educational and academic training at University of Peshawar, Foundation University Islamabad and UET Lahore, Pakistan. He joined COMSATS Institute of Information Technology as assistant professor in the early 2015. Along with teaching responsibilities, he is an active researcher and reviewers of various conferences and reputed journals.

References (25)

  • G. Russello et al.

    Firedroid: hardening security in almost-stock android

  • G. Jacob et al.

    A static, packer-agnostic filter to detect similar malware samples

  • B. Sanz et al.

    Anomaly detection using string analysis for android malware detection

  • D. Arp et al.

    DREBIN: effective and explainable detection of android malware in your pocket

  • S.Y. Yerima et al.

    A new android malware detection approach using bayesian classification

  • S. Arzt

    Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps

  • A. Armando et al.

    Mobile app security analysis with the MAVeriC static analysis module

    JoWUA

    (2014)
  • P. Faruki et al.

    Androsimilar: robust statistical feature signature for android malware detection

  • M. Zheng et al.

    Droid analytics: a signature based analytic system to collect, extract, analyze and associate android malware

  • J. McClurg et al.

    Android privacy leak detection via dynamic taint analysis

    Electric. Eng. Comput. Sci.

    (2013)
  • L.K. Yan et al.

    Droidscope: seamlessly reconstructing the OS and dalvik semantic views for dynamic android malware analysis

  • V. Rastogi et al.

    AppsPlayground: automatic security analysis of smartphone applications

  • Cited by (0)

    Zahoor-ur-Rehman has experience both in academia and research. He has received his educational and academic training at University of Peshawar, Foundation University Islamabad and UET Lahore, Pakistan. He joined COMSATS Institute of Information Technology as assistant professor in the early 2015. Along with teaching responsibilities, he is an active researcher and reviewers of various conferences and reputed journals.

    Sidra Nasim Khan received MCS and MSCS degree from COMSATS Institute of Information Technology in 2014 and 2016. She joined COMSATS Attock campus, Pakistan as a lecturer in 2017. Her specialization area is machine learning and smartphone security.

    Khan Muhammad (S’16) is a Research Associate at Intelligent Media Laboratory, Sejong University, Republic of Korea. He has authored over 30 papers in peer-reviewed international journals and conferences in the areas of image and video processing, information security, image and video steganography, video summarization, diagnostic hysteroscopy, wireless capsule endoscopy, computer vision, deep learning, and video surveillance.

    Jong Weon Lee received M.S. degree in Electrical and Computer Engineering from University of Wisconsin at Madison in 1991, and Ph.D. degree from University of Southern California in 2002. He is presently Prefessor of Department of Software at Sejong University. His research interests include augmented reality, human-computer interaction and serious game.

    Zhihan Lv is an engineer and researcher of virtual/augmented reality and multimedia major in Mathematics and Computer Science, having plenty of work experience with respect to virtual reality and augmented reality projects, engaging in the application of computer visualization and computer vision. His research application fields widely range from everyday life to traditional research fields (i.e., geography, biology, medicine).

    Sung Wook Baik is currently a Full Professor and Dean of Digital Contents at Sejong University. He received his Ph.D. degree in information technology engineering from George Mason University, USA in 1999. He served as professional reviewer for several well-reputed journals. His research interests include computer vision, multimedia, pattern recognition, machine learning, data mining, virtual reality, and computer games.

    Peer Azmat Shah is specialized in Computer Networks specially design, implementation, and operation of Future Internet and deeply involved in the research and policies surrounding the Internet. He is Assistant Professor and Head of Department in Computer Science Department at COMSATS IIT, Attock Pakistan since September 2014. He is also leading the Internet, Communications and NETworks (ICNet) research lab at COMSATS.

    Khalid Mahmood Awan is specialized in Computer Networks specially resource management in wireless networks, security requirement of networks. He is an Assistant Professor and Deputy Head of Department in Computer Science Department at COMSATS Institute of Information Technology, Attock Pakistan since September 2016. He is also member of Internet, Communications and NETworks (ICNet) research lab at COMSATS.

    Irfan Mehmood is an Assistant Professor in Sejong University. He conducts research in a number of basic and applied areas such as image and scene segmentation, motion and video analysis, perceptual grouping, shape analysis and object recognition. His research methods emphasizes the use of segmented (part- based) symbolic descriptions of objects.

    Reviews processed and recommended for publication to the Editor-in-Chief by Guest Editor Dr. S. Liu.

    View full text