Abstract
Android security permissions are built-in security features that constrain what an app can do and access on the system, that is, its privileges. Permissions have been widely used for Android malware detection, mostly in combination with other relevant app attributes. The available set of permissions is dynamic, refined in every new Android OS version release. The refinement process adds new permissions and deprecates others. These changes directly impact the type and prevalence of permissions requested by malware and legitimate applications over time. Furthermore, malware trends and benign apps’ inherent evolution influence their requested permissions. Therefore, the usage of these features in machine learning-based malware detection systems is prone to concept drift issues. Despite that, no previous study related to permissions has taken into account concept drift. In this study, we demonstrate that when concept drift is addressed, permissions can generate long-lasting and effective malware detection systems. Furthermore, the discriminatory capabilities of distinct set of features are tested. We found that the initial set of permissions, defined in Android 1.0 (API level 1), are sufficient to build an effective detection model, providing an average 0.93 F1 score in data that spans seven years. In addition, we explored and characterized permissions evolution using local and global interpretation methods. In this regard, the varying importance of individual permissions for malware and benign software recognition tasks over time are analyzed.














Similar content being viewed by others
References
Hautala, L.: Android malware tries to trick you. Here’s how to spot it. https://www.cnet.com/tech/services-and-software/android-malware-tries-to-trick-you-heres-how-to-spot-it/ (2021)
Palmer, D.: Sophisticated android malware spies on smartphones users and runs up their phone bill too. https://www.zdnet.com/article/sophisticated-android-malware-spies-on-smartphones-users-and-runs-up-their-phone-bill-too/ (2018)
Yaswant, A.: New advanced android malware posing as “ystem update”. https://blog.zimperium.com/new-advanced-android-malware-posing-as-system-update/ (2021)
O’Dea, S.: Mobile operating systems’ market share worldwide from January 2012 to June 2021. https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009/ (2021)
Kaspersky: Can you get viruses on android? every android user is at risk. https://www.kaspersky.com/resource-center/preemptive-safety/android-malware-risk (2021)
Velzian, B.: Calling all threat hunters—mobile malware to look out for in 2021. https://www.wandera.com/calling-all-threat-hunters-mobile-malware-to-look-out-for-in-2021/ (2021)
Android: App permissions best practices. https://developer.android.com/training/permissions/usage-notes (2021)
Google: Google play protect. https://developers.google.com/android/play-protect (2021)
Samsung: This is protection, samsung knox. https://www.samsungknox.com/en/secured-by-knox (2021)
Withwam, R.: Android antivirus apps are useless—here’s what to do instead. https://www.extremetech.com/computing/104827-android-antivirus-apps-are-useless-heres-what-to-do-instead (2020)
Lakshmanan, R.: Joker malware apps once again bypass Google’s security to spread via play store. https://thehackernews.com/2020/07/joker-android-mobile-virus.html (2020)
Chebyshev, V.: Mobile malware evolution 2020. https://securelist.com/mobile-malware-evolution-2020/101029 (2021)
Faruki, P., Ganmoor, V., Laxmi, V., Gaur, M.S., Bharmal, A.: Androsimilar: robust statistical feature signature for android malware detection. In: Proceedings of the 6th International Conference on Security of Information and Networks, pp. 152–159 (2013)
Feizollah, A., Anuar, N.B., Salleh, R., Wahab, A.W.A.: A review on feature selection in mobile malware detection. Digit. Investig. 13, 22–37 (2015)
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: Drebin: effective and explainable detection of android malware in your pocket. In: Ndss, vol. 14, pp. 23–26 (2014)
Lipovský, R., Štefanko, L., Braniša, G.: The rise of android ransomware. https://www.welivesecurity.com/wp-content/uploads/2016/02/Rise_of_Android_Ransomware.pdf (2016)
Mathur, A., Podila, L.M., Kulkarni, K., Niyaz, Q., Javaid, A.Y.: Naticusdroid: a malware detection framework for android using native and custom permissions. J. Inf. Secur. Appl. 58, 102696 (2021)
Khariwal, K., Singh, J., Arora, A.: Ipdroid: android malware detection using intents and permissions. In: 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), pp. 197–202. IEEE (2020)
Android: Request app permissions. https://developer.android.com/training/permissions/requesting (2021)
Android: Permissions on android. https://developer.android.com/guide/topics/permissions/overview (2021)
Android: Manifest.permission., https://developer.android.com/reference/android/Manifest.permission (2021)
Android: Define a custom app permission. https://developer.android.com/guide/topics/permissions/defining (2021)
Codepath: Understanding app permissions. https://guides.codepath.com/android/Understanding-App-Permissions (2021)
Android: App permissions best practices. https://developer.android.com/training/permissions/usage-notes (2021)
Android: Permissions updates in android 11. https://developer.android.com/about/versions/11/privacy/permissions (2021)
JR, R.: Android versions: A living history from 1.0 to 12. https://www.computerworld.com/article/3235946/android-versions-a-living-history-from-1-0-to-today.html (2021)
Milosevic, N., Dehghantanha, A., Choo, K.-K.R.: Machine learning aided android malware classification. Comput. Electr. Eng. 61, 266–274 (2017)
Zhu, H.-J., You, Z.-H., Zhu, Z.-X., Shi, W.-L., Chen, X., Cheng, L.: Droiddet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272, 638–646 (2018)
Talha, K.A., Alper, D.I., Aydin, C.: Apk auditor: permission-based android malware detection system. Digit. Investig. 13, 1–14 (2015)
Rovelli, P., Vigfússon, \(\acute{Y}\).: Pmds: permission-based malware detection system. In: International Conference on Information Systems Security. Springer, pp. 338–357 (2014)
Sanz, B., Santos, I., Laorden, C., Ugarte-Pedrero, X., Bringas, P.G., Álvarez, G.: Puma: permission usage to detect malware in android. In: International Joint Conference CISIS’12-ICEUTE 12-SOCO 12 Special Sessions, pp. 289–298. Springer (2013)
Zarni Aung, W.Z.: Permission-based android malware detection. Int. J. Sci. Technol. Res. 2, 228–234 (2013)
Wang, W., Wang, X., Feng, D., Liu, J., Han, Z., Zhang, X.: Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans. Inf. Forensics Secur. 9, 1869–1882 (2014)
Ghasempour, A., Sani, N.F.M., Abari, O.J.: Permission extraction framework for android malware detection. Int. J. Adv. Comput. Sci. Appl. 11(11) (2020)
Arora, A., Peddoju, S.K., Conti, M.: Permpair: android malware detection using permission pairs. IEEE Trans. Inf. Forensics Secur. 15, 1968–1982 (2019)
Liu, X., Liu, J.: A two-layered permission-based android malware detection scheme. In: 2014 2nd IEEE International Conference on Mobile Cloud Computing, Services, and Engineering, pp. 142–148 (2014). https://doi.org/10.1109/MobileCloud.2014.22
Moonsamy, V., Rong, J., Liu, S.: Mining permission patterns for contrasting clean and malicious android applications. Futur. Gener. Comput. Syst. 36, 122–132 (2014)
Sokolova, K., Perez, C., Lemercier, M.: Android application classification and anomaly detection with graph-based permission patterns. Decis. Support Syst. 93, 62–76 (2017)
Wang, C., Xu, Q., Lin, X., Liu, S.: Research on data mining of permissions mode for android malware detection. Clust. Comput. 22, 13337–13350 (2019)
Idrees, F., Rajarajan, M., Conti, M., Chen, T.M., Rahulamathavan, Y.: Pindroid: a novel android malware detection system using ensemble learning methods. Comput. Secur. 68, 36–46 (2017)
Sanz, B., Santos, I., Laorden, C., Ugarte-Pedrero, X., Nieves, J., Bringas, P.G., Álvarez Marañón, G.: Mama: manifest analysis for malware detection in android. Cybern. Syst. 44, 469–488 (2013)
Arslan, R.S., Ölmez, E., Er, O.: Afwdroid: deep feature extraction and weighting for android malware detection. Dicle üniversitesi MÜhendislik Fakültesi Mühendislik Dergisi 12, 237–245 (2021)
Alazab, M., Alazab, M., Shalaginov, A., Mesleh, A., Awajan, A.: Intelligent mobile malware detection using permission requests and API calls. Futur. Gener. Comput. Syst. 107, 509–521 (2020)
Tao, G., Zheng, Z., Guo, Z., Lyu, M.R.: Malpat: mining patterns of malicious and benign android apps via permission-related APIs. IEEE Trans. Reliab. 67, 355–369 (2017)
Kim, T., Kang, B., Rho, M., Sezer, S., Im, E.G.: A multimodal deep learning method for android malware detection using various features. IEEE Trans. Inf. Forensics Secur. 14, 773–788 (2018)
Hu, D., Ma, Z., Zhang, X., Li, P., Ye, D., Ling, B.: The concept drift problem in android malware detection and its solution. Secur. Commun. Netw. 2017 (2017)
Guerra-Manzanares, A., Nomm, S., Bahsi, H.: In-depth feature selection and ranking for automated detection of mobile malware. In: ICISSP, pp. 274–283 (2019)
Zhou, Y., Wang, Z., Zhou, W., Jiang, X.: Hey, you, get off of my market: detecting malicious apps in official and alternative android markets. In: NDSS, vol. 25, pp. 50–52 (2012)
Lindorfer, M., Neugschwandtner, M., Platzer, C.: Marvin: Efficient and comprehensive mobile app classification through static and dynamic analysis. In: IEEE 39th Annual Computer Software and Applications Conference, vol. 2, pp. 422–433. IEEE (2015)
Arora, A., Peddoju, S.K.: Ntpdroid: a hybrid android malware detector using network traffic and system permissions. In: 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), pp. 808–813. IEEE (2018)
Arora, A., Peddoju, S.K., Chouhan, V., Chaudhary, A.: Hybrid android malware detection by combining supervised and unsupervised learning. In: Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 798–800 (2018)
Zhou, Y., Jiang, X.: Dissecting android malware: characterization and evolution. In: IEEE Symposium on Security and Privacy, pp. 95–109. IEEE (2012)
Guerra-Manzanares, A., Bahsi, H., Nõmm, S.: Kronodroid: time-based hybrid-featured dataset for effective android malware detection and characterization. Comput. Secur. 110, 102399 (2021)
Mila: Contagio mobile. http://contagiominidump.blogspot.com/ (2018)
Arp, D., Quiring, E., Pendlebury, F., Warnecke, A., Pierazzi, F., Wressnegger, C., Cavallaro, L., Rieck, K.: Dos and don’ts of machine learning in computer security (2020). arXiv preprint arXiv:2010.09470
Pendlebury, F., Pierazzi, F., Jordaney, R., Kinder, J., Cavallaro, L.: \(\{\)TESSERACT\(\}\): eliminating experimental bias in malware classification across space and time. In: 28th \(\{\)USENIX\(\}\) Security Symposium (\(\{\)USENIX\(\}\) Security 19), pp. 729–746 (2019)
Allix, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: Are your training datasets yet relevant? In: International Symposium on Engineering Secure Software and Systems, pp. 51–67. Springer (2015)
Cen, L., Gates, C.S., Si, L., Li, N.: A probabilistic discriminative model for android malware detection with decompiled source code. IEEE Trans. Dependable Secure Comput. 12, 400–412 (2015)
Xu, K., Li, Y., Deng, R., Chen, K., Xu, J.: Droidevolver: self-evolving android malware detection system. In: IEEE European Symposium on Security and Privacy (EuroS &P), pp. 47–62. IEEE (2019)
Lei, T., Qin, Z., Wang, Z., Li, Q., Ye, D.: Evedroid: event-aware android malware detection against model degrading for ioT devices. IEEE Internet Things J. 6, 6668–6680 (2019)
Guerra-Manzanares, A., Luckner, M., Bahsi, H.: Android malware concept drift using system calls: detection, characterization and challenges. Expert Syst. Appl. 117200, 117200 (2022)
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31, 2346–2363 (2018)
Lu, N., Zhang, G., Lu, J.: Concept drift detection via competence models. Artif. Intell. 209, 11–28 (2014)
Jordaney, R., Sharad, K., Dash, S.K., Wang, Z., Papini, D., Nouretdinov, I., Cavallaro, L.: TranscEnd: detecting concept drift in malware classification models. In: Proceedings of the 26th USENIX Security Symposium, pp. 625–642 (2017)
Hooker, G., Mentch, L.: Please stop permuting features: an explanation and alternatives (2019). arXiv preprint arXiv:1905.03151
Samara, B., Randles, R.H.: A test for correlation based on Kendall’s tau. Commun. Stat. Theory Methods 17, 3191–3205 (1988)
Aggarwal, C.C.: Data Mining: The Textbook. Springer, Berlin (2015)
Zyblewski, P., Sabourin, R., Woźniak, M.: Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Inf. Fusion 66, 138–154 (2021)
Guerra-Manzanares, A., Nõmm, S., Bahsi, H.: Time-frame analysis of system calls behavior in machine learning-based mobile malware detection. In: 2019 International Conference on Cyber Security for Emerging Technologies (CSET), pp. 1–8 (2019). https://doi.org/10.1109/CSET.2019.8904908
Guerra-Manzanares, A., Bahsi, H., Nõmm, S.: Differences in android behavior between real device and emulator: a malware detection perspective. In: 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS), pp. 399–404 (2019). https://doi.org/10.1109/IOTSMS48152.2019.8939268
Maimon, O., Rokach, L. (eds.): Data Mining and Knowledge Discovery Handbook. A Complete Guide for Practitioners and Researchers. Springer, San Francisco (2005)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Altmann, A., Toloşi, L., Sander, O., Lengauer, T.: Permutation importance: a corrected feature importance measure. Bioinformatics 26, 1340–1347 (2010)
Biecek, P., Burzykowski, T.: Explanatory Model Analysis. Chapman and Hall, New York (2021)
Molnar, C.: Interpretable machine learning. https://christophm.github.io/interpretable-ml-book/ (2019)
Shapley, L.S.: 17. A Value for n-person Games. Princeton University Press, Princeton (2016)
Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, New York (2011)
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Wu, L.: Android mobile ransomware: bigger, badder, better? https://www.trendmicro.com/en_us/research/17/h/android-mobile-ransomware-evolution.html (2017)
Seals, T.: Slocker android ransomware resurfaces in undetectable form. https://www.infosecurity-magazine.com/news/slocker-android-ransomware/ (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A. Two-sample Kolmogorov-Smirnov statistical test
To formally compare the performance of both feature sets (i.e., extended vs. reduced), the non-parametric Kolmogorov–Smirnov test was used to analyze the equality of both probability distributions (i.e., two-sample K–S test). The test statistic quantifies the distance between the empirical distribution functions (EDF) of two data samples. In our case, the distribution of F1 score data is compared for the reduced and extended feature sets. The results show that with a p-value = 0.7634, the test cannot be the basis for \(H_0\) rejection that the F1 score distributions come from the same distribution. Moreover, the K–S statistic value, which reflects the maximum distance between the EDFs, is relatively small (i.e., K–S = 0.04). The test cumulative density function (ECDF) is illustrated in Fig. 15. Thus, it can be concluded that there is no significant difference in detection performance between the compared feature sets.
Appendix B. Wilcoxon signed-rank test
Wilcoxon signed-rank test is a non-parametric statistical hypothesis test (i.e., no normality distribution is assumed) that enables the comparison of two data distributions (i.e., specificity and recall in this case). However, the test requires the same amount of data on both distributions and, in our particular case, as not all features were found important in all periods, there were missing values in quarters regarding specific features. The quantity of missing values is critical. The data for the reduced vector showed nearly 70% missing values while reaching 83% for the extended vector. Therefore, the statistical analysis was performed only on the reduced feature set data. In this regard, as the negative values of function (4) are unknown but are needed for the test, a solution could be replacing missing values with zero values. Yet, the large missing value ratio might bias the comparison results, groundlessly increasing the similarity of the vector with a large number of missing values. Therefore, a better approach is to replace the missing values with the mean importance of the feature, calculated for the given data set and taken as a negative value. Thus, vectors are compared using means when the number of missing values is high and using the distribution of importance otherwise.
The results of the Wilcoxon test are reported in Table 2, calculated for the reduced feature set, and used to distinguish permissions important to optimize specificity and recall tasks. The table reports the features with a p-value less than 0.005, which suggests a highly significant difference between the compared vectors. The occurrence refers to the number of not missing values in the compared vectors (i.e., the number of times the feature was found important in recall or specificity). The maximum and total importance are provided for each feature. The maximum importance reflects the maximum value of importance the feature had in any quarter, while the total importance is the cumulative importance of the specific feature in all quarters. The features are ordered based on the completeness of the data vectors (i.e., from a larger number of occurrences to a smaller number).
As reported in Table 2, among the 44 shared features found important for both recognition tasks, 29 showed statistically significant differences with a p-value \(< 0.005\). Among these important features showing significantly distinct importance distributions for malware detection and benign software recognition, there are relevant concept drift-related features such as READ_PHONE_STATE, SEND_SMS, and MOUNT_UNMOUNT_FILESYSTEMS. Apart from the large total importance, all these features have high occurrence and the largest maximum importance values. In all cases, the obtained importance is significantly higher for specificity. Therefore, based on this table, it can be concluded that important features for benign app recognition (i.e., specificity) are not equally relevant for the malware recognition task (i.e., recall).
Appendix C. Android permission set evolution
Table 3 provides the modifications that changed the available set of Android security permissions over time. This table gives a notion of the dynamic character of the permission set and its evolution throughout the whole Android lifetime. The permissions set was first defined for API level 1 (i.e., the first release of Android) and has been constantly modified since then. Table 3 provides the evolution of the available permission set from API level 1 to API level 30. The table is ordered chronologically based on API level release (i.e., from the oldest to the latest), and it provides API level-related information such as release date (i.e., Date) and OS version name. For each API level (i.e., rows), the added and deprecated permissions in the corresponding API level are provided in the Added Permissions and Deprecated Permissions columns. Furthermore, for each added permission, the protection level is provided in the column Type. Three protection levels are defined: dangerous, normal and others, reported as D, N and O, respectively. The others category refers to permissions that can be requested by third-party apps and that do not belong to the dangerous or normal category, as defined in the official documentation [21]. If the permission cannot be used by third-party apps, it is referenced with a hyphen (-). For each deprecated permission, the API level that introduced the permission is provided within parenthesis. Lastly, the column Set reports the number of available permissions (i.e., excluding deprecated) in each API level.
Rights and permissions
About this article
Cite this article
Guerra-Manzanares, A., Bahsi, H. & Luckner, M. Leveraging the first line of defense: a study on the evolution and usage of android security permissions for enhanced android malware detection. J Comput Virol Hack Tech 19, 65–96 (2023). https://doi.org/10.1007/s11416-022-00432-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-022-00432-3