Elsevier

Computers & Security

Volume 78, September 2018, Pages 429-453
Computers & Security

Leveraging ontologies and machine-learning techniques for malware analysis into Android permissions ecosystems

https://doi.org/10.1016/j.cose.2018.07.013Get rights and content

Abstract

Smartphones form a complex application ecosystem with a myriad of components, properties, and interfaces that produce an intricate relationship network. Given the intrinsic complexity of this system, we hereby propose two main contributions. First, we devise a methodology to systematically determine and analyze the complex relationship network among components, properties, and interfaces associated with the permission mechanism in Android ecosystems. Second, we investigate whether it is possible to identify characteristics shared by malware samples at this high level of abstraction that could be leveraged to unveil their presence. We propose an ontology-based framework to model the relationships between application and system elements, together with a machine-learning approach to analyze the complex network that arises therefrom. We represent the ontological model for the considered Android ecosystem with 4570 apps through a graph with some 55,000 nodes and 120,000 edges. Experiments have shown that a classifier operating on top of this complex representation can achieve an accuracy of 88% and precision of 91% and is capable of identifying and determining 24 features that correspond to 70 important graph nodes related to malware activity, which is a remarkable feat for security.

Introduction

Smartphones have become ubiquitous computing devices worldwide. A recent Ericsson Mobility Report (Carson et al., 2016) indicated that smartphones currently represent 55% of all mobile subscriptions globally. The report further projects the number of unique mobile subscribers to reach 6.1 billion by 2022, covering roughly 75% of the world’s population. Despite the multitude of different device models and the availability of several different operating systems for smartphones, the Android operating system currently holds 88% of market share (Sui, 2016).

Mobile devices are increasingly being used for activities that directly impact social, work, and financial environments; as such, they have become a primary target for cyber-criminals. A study published by Lee and Talbot (2016) concluded that, in the United Kingdom, the top ten usages for smartphones include social networking, emailing, banking, and shopping with similar patterns across other developed countries. To the eyes of a cyber-criminal, social networks can be viewed as a repository of the smartphone user’s personal information; work-related emails are a potential source of sensitive information, and banking apps are the gateway for accessing the user’s finances (Bojjagani, Sastry, 2016, Chanajitt, Viriyasitavat, Choo, 2016, Kadir, Stakhanova, Ghorbani, Lee, Zhang, Chen, 2013).

As a prophylactic security measure against unauthorized use or access, the Android ecosystem possesses a permission system for its applications (apps) (Enck et al., 2009). The permission system informs the user of which system resources and information an app uses prior to installation so that the user can make an informed choice on whether or not to install that app based on the resources used. However, Kelley et al. (2012) and Felt et al. (2012b) have shown flaws in the use of the permission system as a preventive security measure. In particular, users tend not to pay attention to permissions, and more worryingly, permission systems sometimes fail to aid users with the task of properly taking security-related decisions. Furthermore, developers tend to overprivilege applications requesting more permissions than necessary, anticipating future releases (Felt, Chin, Hanna, Song, Wagner, 2011, Felt, Egelman, Finifter, Akhawe, Wagner, 2012a). Moreover, Android documentation also has flaws in mapping permissions related to system calls, as described in the study from Pscout developers (Au et al., 2012), a software that intercepts system calls and keeps track of which permissions are tested by the operating system, producing actual documentation about which permissions are verified in each system-call access.

As a matter of fact, malicious apps can control seemingly harmless system resources to exploit a vulnerability in another app (Kelley et al., 2012) indirectly. Given that the Android ecosystem has over 1.7 million apps and 235 different permissions (Olmstead and Atkinson, 2015), the task of mapping and analyzing relationships among permissions, malware, and benign apps is daunting and, undoubtedly, cannot be manually performed by a human curator. Likewise, any developed methodology must be extensible, automatic, and dynamic to allow for new characteristics to be taken into consideration on the fly as apps, malware, and permissions are continuously added or removed from the ecosystem.

Given the above, application testing in Android devices faces important challenges (Wang and Alshboul, 2015) that must be addressed. Within this context, the present contribution proposes two methods (described in Section 4): the first for mapping relationships in the Android ecosystem using ontologies and the second, a machine-learning-based solution to analyze malware features from the obtained network of relations and dependencies. We validate the effectiveness of these methods in Section 4 and show that the proposed methods are able to determine the most important nodes related to malware activity, representing an important contribution to smartphone security.

Section snippets

Concepts and related work

Before we move on to the new methods we propose in this paper, we present a brief introduction to Android security, ontologies, and feature engineering using Bags of Graphs as well as the random forests classifier, which are necessary concepts to understand the paper. The expert reader can go directly to Section 3, where the new methods are introduced.

Proposed method

In this work, our primary goal is to analyze which permissions and resources are related to malicious apps in the Android ecosystem as represented in the Android manifests. We rely solely upon application manifest XML files as our source of information. The reasoning for this choice is that such files are publicly available and do not require any reverse engineering, code execution monitoring, or complicated code-level analysis to detect the presence of malware in a system, as described in

Experiments and results

In the following sections, we report on the experiments conducted to verify the method proposed in Section 3 with real-world data. In Section 4.1, we describe the metrics used to evaluate the performance of classifiers; in Section 4.2, we explain the Android ecosystem used on the experiments, which was transformed by the pre-processing method described in Sections 3.1 and 3.2 onto the features dataset. The full dataset was broken down into two partitions, one for the fitting process and another

Conclusion and future work

In this paper, we have introduced two new methods to address the problem of mapping the relationships and characteristics of malicious software in smartphones. We provided an extensible framework for mapping the analyzed elements in the Android system using ontologies, as well as a random forest-based method for automatically extracting meaningful information from the ontological map obtained from the new mapping algorithm. Experimental results in the considered Android ecosystem showed that

Acknowledgment

We thank the financial support of Intel Strategic Research Alliance (Grant #440850/2013-4), the National Council for Scientific and Technological Development – CNPq (Grant #302224/2015-7), the São Paulo Research Foundation (FAPESP) (DéjàVu Grant #2017/12646-3), and the Coordination for the Improvement of Higher Education Personnel – Capes (DeepEyes grant), as well as Cambridge Trusts-CAPES grant BEX 9407-11-1.

Luiz C. Navarro is an electronic engineer with specialization in digital systems, graduated in 1982 from Polytechnic School of the University of Sao Paulo, with extensive experience in the market of software development, system integration and software architecture. Currently, he is a master’s student in Computer Science at the Institute of Computing of the University of Campinas (UNICAMP), focusing research in systems security, Android security, digital forensics, ontologies and machine

References (72)

  • L. Breiman

    Bagging predictors

    Mach Learn

    (1996)
  • L. Breiman

    Out-of-bag estimation

    Technical Report

    (1996)
  • L. Breiman

    Random forests

    Mach Learn

    (2001)
  • L. Breiman et al.

    Classification and regression trees

    (1984)
  • Carson S., Furuskr A., Jonsson P., Kronander J., Lindberg P., Ludwig R., hman K., Sehti J.S.. Ericson mobility report....
  • R. Caruana et al.

    An empirical comparison of supervised learning algorithms

    Proceedings of the 23rd international conference on machine learning (ICML ’06)

    (2006)
  • R. Chanajitt et al.

    Forensic analysis and security assessment of android m-banking apps

    Aust J Forensic Sci

    (2016)
  • Community V.. Apk malware samples acquired from a torrent. 2017a. Accessed:...
  • Community V.. Virustotal public api v2.0. 2017b. Accessed:...
  • A. Criminisi et al.

    Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning

    Found Trends®  Comput Graph Vis

    (2012)
  • S. Das et al.

    Semantics-based online malware detection: towards efficient real-time protection against malware

    IEEE Trans Inf Forensics Secur

    (2016)
  • Eddy M.. Mobile threat monday: Android apps hide windows malware. 2014. Accessed:...
  • K. Eilbeck et al.

    The sequence ontology: a tool for the unification of genome annotations

    Genome Biol

    (2005)
  • N. Elenkov

    Android security internals: an in-depth guide to Android’s security architecture

    (2014)
  • W. Enck et al.

    Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones

    Proceedings of the 9th USENIX conference on operating systems design and implementation (OSDI’10)

    (2010)
  • W. Enck et al.

    Understanding android security

    IEEE Secur Privacy

    (2009)
  • P. Faruki et al.

    Android security: a survey of issues, malware penetration and defenses

    IEEE Commun Surv Tutor

    (2015)
  • A.P. Felt et al.

    Android permissions demystified

    Proceedings of the 18th ACM conference on computer and communications security (CCS ’11)

    (2011)
  • A.P. Felt et al.

    How to ask for permission

    Proceedings of the 7th USENIX conference on hot topics in security (HotSec’12)

    (2012)
  • A.P. Felt et al.

    Android permissions: user attention, comprehension, and behavior

    Proceedings of the eighth symposium on usable privacy and security (SOUPS ’12)

    (2012)
  • Fenz S.. Ontology-and Bayesian-based information security risk management....
  • S. Fenz et al.

    Formalizing information security knowledge

    Proceedings of the 4th international symposium on information, computer, and communications security (ASIACCS ’09)

    (2009)
  • Google. Google play. 2017. Accessed:...
  • T. Gruber

    Ontology

    Encyclopedia of database systems

    (2009)
  • N. Guarino et al.

    What is an ontology?

    Handbook on ontologies

    (2009)
  • M. Hartung et al.

    Recent advances in schema and ontology evolution

  • Cited by (26)

    • An ontology-driven framework for knowledge representation of digital extortion attacks

      2023, Computers in Human Behavior
      Citation Excerpt :

      To the best of our knowledge, no related ontology has been provided for extortion assaults and their relationship to system behaviors and components that can answer the aforementioned competency queries. Given that our goal is different from the ontologies presented in software (Hilario et al., 2009; Keet et al., 2015; Malone et al., 2014; Oberle et al., 2009), cybersecurity (Gao et al., 2013; Huang et al., 2010, 2014; Iannacone et al., 2015; Jia et al., 2018; Mozzaquatro et al., 2018; Narayanan et al., 2018; Navarro et al., 2018; Oltramari et al., 2014; Rastogi et al., 2020; Salini & Shenbagam, 2015; Shoaib & Farooq, 2015; Syed et al., 2016; Undercoffer et al., 2003), and vulnerability management (Mittal et al., 2016; Syed, 2020), we start developing the ontology from scratch. Although there were slight overlaps in some of the concepts and specifications between the proposed ontology and the research work mentioned, due to the small number, we manually merged them into the Rantology.

    • Detection of malicious Android applications using Ontology-based intelligent model in mobile cloud environment

      2021, Journal of Information Security and Applications
      Citation Excerpt :

      The permissions and the resources protected by the permissions are extracted from the apps and are used to construct an ontology graph using Protege. To reduce the time required for generating the feature vector as done in [32], a standard query language Simple Protocol And Resource Query Language (SPARQL), is used to collect concepts from the ontology graph and to generate a concept vector for each app in lesser time. From the existing work, it is observed that the permissions are an essential feature set to discriminate the apps, while FS is required to improve the detection rate.

    • APTMalInsight: Identify and cognize APT malware based on system call information and ontology knowledge framework

      2021, Information Sciences
      Citation Excerpt :

      The proposed model can overcome the challenges of virtual machine evasion and polymorphic malware. Navarro et al. [33] proposed an ontology-based framework to simulate the relationship between applications and system elements. The author uses machine learning methods to analyze complex networks and identify common characteristics of malware samples.

    View all citing articles on Scopus

    Luiz C. Navarro is an electronic engineer with specialization in digital systems, graduated in 1982 from Polytechnic School of the University of Sao Paulo, with extensive experience in the market of software development, system integration and software architecture. Currently, he is a master’s student in Computer Science at the Institute of Computing of the University of Campinas (UNICAMP), focusing research in systems security, Android security, digital forensics, ontologies and machine learning.

    Alexandre K. W. Navarro is a Machine Learning Ph.D. student at the University of Cambridge Engineering Department. His major academic interests lie in approximate inference, probabilistic graphical models and machine learning. He also holds an M.Sc. and a B.Sc. in Chemical Engineering from the University of Campinas (UNICAMP) with an emphasis in control systems, optimization and simulation.

    Andre Gregio is an Assistant Professor at the Federal University of Parana, Brazil (UFPR). His research interests include several aspects of computer and network security, such as countermeasures against malicious codes, security data visualization/analysis, and mobile security. Prof. Gregio is funded by the Brazilian National Counsel of Technological and Scientific Development (CNPq) and the Brazilian Ministry of Health. In 2017, Prof. Gregio was awarded the Google Latin America Research Award for his proposal on automatic detection of concept-drift in malware classifiers.

    Anderson Rocha is an associate professor at the Institute of Computing, University of Campinas. His main interests include Reasoning for Complex Data, Digital Forensics and Machine Intelligence. He is an IEEE Senior Member, an elected affiliate member of the Brazilian Academy of Sciences (ABC) and of the IEEE Information Forensics and Security Technical Committee. He is a Microsoft Research Faculty Fellow, a Google Research Faculty Fellow and a Tan Chin Tuan Fellow. Finally, he is currently the principal investigator of a number of research projects in partnership with public funding agencies and multinational companies having already licensed several patents.

    Ricardo Dahab is associate professor at the University of Campinas’ (UNICAMP) Institute of Computing. He holds a Computer Science Masters degree from UNICAMP and a Ph.D. in Combinatorics and Optimization from the University of Waterloo. His teaching and research interests are in Cryptography and Information Security. In academic research his main contributions are in elliptic curve-based cryptographic methods, some of which have become industry standards. Prof. Dahab has also been engaged in several R&D projects in partnership with industry and other research institutions, which have turned out successful products among which is the official HSM (Hardware Security Module) supporting the Brazilian PKI’s root certification authority. He has been an active member in joint efforts by the security community in Brazil and Latin America to promote and consolidate the area in the region, having served in several committees and organized events such as The 2009 Brazilian Symposium on Information and Systems Security (SBSeg), The 2011 Advanced School of Cryptography in 2011, the Latincrypt School in 2011 and 2013, the Cryptology and Network Security Symposium (CANS 2013) in 2013, PKC 2018, among others. He has also contributed to the creation and expansion in Brazil and Latin America of ACM’s International Collegiate Programming Contest, of which he is Latin America’s Director of Contests. He was one of the recipients of the 2011 UNICAMP’s Zeferino Vaz academic excellence award and of the 2013 UNICAMP’s Inventors Award.

    View full text