Leveraging ontologies and machine-learning techniques for malware analysis into Android permissions ecosystems
Introduction
Smartphones have become ubiquitous computing devices worldwide. A recent Ericsson Mobility Report (Carson et al., 2016) indicated that smartphones currently represent 55% of all mobile subscriptions globally. The report further projects the number of unique mobile subscribers to reach 6.1 billion by 2022, covering roughly 75% of the world’s population. Despite the multitude of different device models and the availability of several different operating systems for smartphones, the Android operating system currently holds 88% of market share (Sui, 2016).
Mobile devices are increasingly being used for activities that directly impact social, work, and financial environments; as such, they have become a primary target for cyber-criminals. A study published by Lee and Talbot (2016) concluded that, in the United Kingdom, the top ten usages for smartphones include social networking, emailing, banking, and shopping with similar patterns across other developed countries. To the eyes of a cyber-criminal, social networks can be viewed as a repository of the smartphone user’s personal information; work-related emails are a potential source of sensitive information, and banking apps are the gateway for accessing the user’s finances (Bojjagani, Sastry, 2016, Chanajitt, Viriyasitavat, Choo, 2016, Kadir, Stakhanova, Ghorbani, Lee, Zhang, Chen, 2013).
As a prophylactic security measure against unauthorized use or access, the Android ecosystem possesses a permission system for its applications (apps) (Enck et al., 2009). The permission system informs the user of which system resources and information an app uses prior to installation so that the user can make an informed choice on whether or not to install that app based on the resources used. However, Kelley et al. (2012) and Felt et al. (2012b) have shown flaws in the use of the permission system as a preventive security measure. In particular, users tend not to pay attention to permissions, and more worryingly, permission systems sometimes fail to aid users with the task of properly taking security-related decisions. Furthermore, developers tend to overprivilege applications requesting more permissions than necessary, anticipating future releases (Felt, Chin, Hanna, Song, Wagner, 2011, Felt, Egelman, Finifter, Akhawe, Wagner, 2012a). Moreover, Android documentation also has flaws in mapping permissions related to system calls, as described in the study from Pscout developers (Au et al., 2012), a software that intercepts system calls and keeps track of which permissions are tested by the operating system, producing actual documentation about which permissions are verified in each system-call access.
As a matter of fact, malicious apps can control seemingly harmless system resources to exploit a vulnerability in another app (Kelley et al., 2012) indirectly. Given that the Android ecosystem has over 1.7 million apps and 235 different permissions (Olmstead and Atkinson, 2015), the task of mapping and analyzing relationships among permissions, malware, and benign apps is daunting and, undoubtedly, cannot be manually performed by a human curator. Likewise, any developed methodology must be extensible, automatic, and dynamic to allow for new characteristics to be taken into consideration on the fly as apps, malware, and permissions are continuously added or removed from the ecosystem.
Given the above, application testing in Android devices faces important challenges (Wang and Alshboul, 2015) that must be addressed. Within this context, the present contribution proposes two methods (described in Section 4): the first for mapping relationships in the Android ecosystem using ontologies and the second, a machine-learning-based solution to analyze malware features from the obtained network of relations and dependencies. We validate the effectiveness of these methods in Section 4 and show that the proposed methods are able to determine the most important nodes related to malware activity, representing an important contribution to smartphone security.
Section snippets
Concepts and related work
Before we move on to the new methods we propose in this paper, we present a brief introduction to Android security, ontologies, and feature engineering using Bags of Graphs as well as the random forests classifier, which are necessary concepts to understand the paper. The expert reader can go directly to Section 3, where the new methods are introduced.
Proposed method
In this work, our primary goal is to analyze which permissions and resources are related to malicious apps in the Android ecosystem as represented in the Android manifests. We rely solely upon application manifest XML files as our source of information. The reasoning for this choice is that such files are publicly available and do not require any reverse engineering, code execution monitoring, or complicated code-level analysis to detect the presence of malware in a system, as described in
Experiments and results
In the following sections, we report on the experiments conducted to verify the method proposed in Section 3 with real-world data. In Section 4.1, we describe the metrics used to evaluate the performance of classifiers; in Section 4.2, we explain the Android ecosystem used on the experiments, which was transformed by the pre-processing method described in Sections 3.1 and 3.2 onto the features dataset. The full dataset was broken down into two partitions, one for the fitting process and another
Conclusion and future work
In this paper, we have introduced two new methods to address the problem of mapping the relationships and characteristics of malicious software in smartphones. We provided an extensible framework for mapping the analyzed elements in the Android system using ontologies, as well as a random forest-based method for automatically extracting meaningful information from the ontological map obtained from the new mapping algorithm. Experimental results in the considered Android ecosystem showed that
Acknowledgment
We thank the financial support of Intel Strategic Research Alliance (Grant #440850/2013-4), the National Council for Scientific and Technological Development – CNPq (Grant #302224/2015-7), the São Paulo Research Foundation (FAPESP) (DéjàVu Grant #2017/12646-3), and the Coordination for the Improvement of Higher Education Personnel – Capes (DeepEyes grant), as well as Cambridge Trusts-CAPES grant BEX 9407-11-1.
Luiz C. Navarro is an electronic engineer with specialization in digital systems, graduated in 1982 from Polytechnic School of the University of Sao Paulo, with extensive experience in the market of software development, system integration and software architecture. Currently, he is a master’s student in Computer Science at the Institute of Computing of the University of Campinas (UNICAMP), focusing research in systems security, Android security, digital forensics, ontologies and machine
References (72)
- et al.
Revisiting security ontologies
Int J Comput Scie Issues
(2014) - et al.
Apk auditor: permission-based android malware detection system
Digit Investig
(2015) - et al.
Mobile security testing approaches and challenges
Proceedings of the 2015 first conference on mobile and secure services (MOBISECSERV)
(2015) - et al.
Semantics-aware android malware classification using weighted contextual api dependency graphs
Proceedings of the 2014 ACM SIGSAC conference on computer and communications security (CCS ’14)
(2014) - et al.
Permutation importance: a corrected feature importance measure
Bioinformatics
(2010) - et al.
Apps permissions in the Google Play Store
Technical Report
(2015) - et al.
Pscout: analyzing the android permission specification
Proceedings of the 2012 ACM conference on computer and communications security (CCS ’12)
(2012) RDF 1.1 N-triples
Technical Report
(2014)- et al.
RDF 1.1 turtle – terse RDF triple language
Technical Report
(2014) - et al.
Stamba: security testing for android mobile banking apps
Bagging predictors
Mach Learn
Out-of-bag estimation
Technical Report
Random forests
Mach Learn
Classification and regression trees
An empirical comparison of supervised learning algorithms
Proceedings of the 23rd international conference on machine learning (ICML ’06)
Forensic analysis and security assessment of android m-banking apps
Aust J Forensic Sci
Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning
Found Trends® Comput Graph Vis
Semantics-based online malware detection: towards efficient real-time protection against malware
IEEE Trans Inf Forensics Secur
The sequence ontology: a tool for the unification of genome annotations
Genome Biol
Android security internals: an in-depth guide to Android’s security architecture
Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones
Proceedings of the 9th USENIX conference on operating systems design and implementation (OSDI’10)
Understanding android security
IEEE Secur Privacy
Android security: a survey of issues, malware penetration and defenses
IEEE Commun Surv Tutor
Android permissions demystified
Proceedings of the 18th ACM conference on computer and communications security (CCS ’11)
How to ask for permission
Proceedings of the 7th USENIX conference on hot topics in security (HotSec’12)
Android permissions: user attention, comprehension, and behavior
Proceedings of the eighth symposium on usable privacy and security (SOUPS ’12)
Formalizing information security knowledge
Proceedings of the 4th international symposium on information, computer, and communications security (ASIACCS ’09)
Ontology
Encyclopedia of database systems
What is an ontology?
Handbook on ontologies
Recent advances in schema and ontology evolution
Cited by (26)
GSEDroid: GNN-based Android malware detection framework using lightweight semantic embedding
2024, Computers and SecurityRecMaL: Rectify the malware family label via hybrid analysis
2023, Computers and SecurityAn ontology-driven framework for knowledge representation of digital extortion attacks
2023, Computers in Human BehaviorCitation Excerpt :To the best of our knowledge, no related ontology has been provided for extortion assaults and their relationship to system behaviors and components that can answer the aforementioned competency queries. Given that our goal is different from the ontologies presented in software (Hilario et al., 2009; Keet et al., 2015; Malone et al., 2014; Oberle et al., 2009), cybersecurity (Gao et al., 2013; Huang et al., 2010, 2014; Iannacone et al., 2015; Jia et al., 2018; Mozzaquatro et al., 2018; Narayanan et al., 2018; Navarro et al., 2018; Oltramari et al., 2014; Rastogi et al., 2020; Salini & Shenbagam, 2015; Shoaib & Farooq, 2015; Syed et al., 2016; Undercoffer et al., 2003), and vulnerability management (Mittal et al., 2016; Syed, 2020), we start developing the ontology from scratch. Although there were slight overlaps in some of the concepts and specifications between the proposed ontology and the research work mentioned, due to the small number, we manually merged them into the Rantology.
Detection of malicious Android applications using Ontology-based intelligent model in mobile cloud environment
2021, Journal of Information Security and ApplicationsCitation Excerpt :The permissions and the resources protected by the permissions are extracted from the apps and are used to construct an ontology graph using Protege. To reduce the time required for generating the feature vector as done in [32], a standard query language Simple Protocol And Resource Query Language (SPARQL), is used to collect concepts from the ontology graph and to generate a concept vector for each app in lesser time. From the existing work, it is observed that the permissions are an essential feature set to discriminate the apps, while FS is required to improve the detection rate.
APTMalInsight: Identify and cognize APT malware based on system call information and ontology knowledge framework
2021, Information SciencesCitation Excerpt :The proposed model can overcome the challenges of virtual machine evasion and polymorphic malware. Navarro et al. [33] proposed an ontology-based framework to simulate the relationship between applications and system elements. The author uses machine learning methods to analyze complex networks and identify common characteristics of malware samples.
Ontology-based knowledge representation for malware individuals and families
2019, Computers and Security
Luiz C. Navarro is an electronic engineer with specialization in digital systems, graduated in 1982 from Polytechnic School of the University of Sao Paulo, with extensive experience in the market of software development, system integration and software architecture. Currently, he is a master’s student in Computer Science at the Institute of Computing of the University of Campinas (UNICAMP), focusing research in systems security, Android security, digital forensics, ontologies and machine learning.
Alexandre K. W. Navarro is a Machine Learning Ph.D. student at the University of Cambridge Engineering Department. His major academic interests lie in approximate inference, probabilistic graphical models and machine learning. He also holds an M.Sc. and a B.Sc. in Chemical Engineering from the University of Campinas (UNICAMP) with an emphasis in control systems, optimization and simulation.
Andre Gregio is an Assistant Professor at the Federal University of Parana, Brazil (UFPR). His research interests include several aspects of computer and network security, such as countermeasures against malicious codes, security data visualization/analysis, and mobile security. Prof. Gregio is funded by the Brazilian National Counsel of Technological and Scientific Development (CNPq) and the Brazilian Ministry of Health. In 2017, Prof. Gregio was awarded the Google Latin America Research Award for his proposal on automatic detection of concept-drift in malware classifiers.
Anderson Rocha is an associate professor at the Institute of Computing, University of Campinas. His main interests include Reasoning for Complex Data, Digital Forensics and Machine Intelligence. He is an IEEE Senior Member, an elected affiliate member of the Brazilian Academy of Sciences (ABC) and of the IEEE Information Forensics and Security Technical Committee. He is a Microsoft Research Faculty Fellow, a Google Research Faculty Fellow and a Tan Chin Tuan Fellow. Finally, he is currently the principal investigator of a number of research projects in partnership with public funding agencies and multinational companies having already licensed several patents.
Ricardo Dahab is associate professor at the University of Campinas’ (UNICAMP) Institute of Computing. He holds a Computer Science Masters degree from UNICAMP and a Ph.D. in Combinatorics and Optimization from the University of Waterloo. His teaching and research interests are in Cryptography and Information Security. In academic research his main contributions are in elliptic curve-based cryptographic methods, some of which have become industry standards. Prof. Dahab has also been engaged in several R&D projects in partnership with industry and other research institutions, which have turned out successful products among which is the official HSM (Hardware Security Module) supporting the Brazilian PKI’s root certification authority. He has been an active member in joint efforts by the security community in Brazil and Latin America to promote and consolidate the area in the region, having served in several committees and organized events such as The 2009 Brazilian Symposium on Information and Systems Security (SBSeg), The 2011 Advanced School of Cryptography in 2011, the Latincrypt School in 2011 and 2013, the Cryptology and Network Security Symposium (CANS 2013) in 2013, PKC 2018, among others. He has also contributed to the creation and expansion in Brazil and Latin America of ACM’s International Collegiate Programming Contest, of which he is Latin America’s Director of Contests. He was one of the recipients of the 2011 UNICAMP’s Zeferino Vaz academic excellence award and of the 2013 UNICAMP’s Inventors Award.