Abstract
With the exponential growth in Android apps, Android based devices are becoming victims of target attackers in the “silent battle” of cybernetics. To protect Android based devices from malware has become more complex and crucial for academicians and researchers. The main vulnerability lies in the underlying permission model of Android apps. Android apps demand permission or permission sets at the time of their installation. In this study, we consider permission and API calls as features that help in developing a model for malware detection. To select appropriate features or feature sets from thirty different categories of Android apps, we implemented ten distinct feature selection approaches. With the help of selected feature sets we developed distinct models by using five different unsupervised machine learning algorithms. We conduct an experiment on 5,00,000 distinct Android apps which belongs to thirty distinct categories. Empirical results reveals that the model build by considering rough set analysis as a feature selection approach, and farthest first as a machine learning algorithm achieved the highest detection rate of 98.8% to detect malware from real-world apps.
Similar content being viewed by others
Notes
Mahindru, Arvind (2020), “Android permissions dataset, Android Malware and benign Application Data set (consist of permissions and API calls)”, Mendeley Data, V3, doi: 10.17632/b4mxg7ydb7.3.
Testing were performed on local system.
Live location of user is seen on Google Maps. Google Maps are pre-installed on Android based devices.
Mahindru, Arvind (2020), “Android permissions dataset, Android Malware and benign Application Data set (consist of permissions and API calls)”, Mendeley Data, V3, doi: http://dx.doi.org/10.17632/b4mxg7ydb7.3
Malware families are identified by VirusTotal.
A data set DS composed by a set (O) of n objects described by a set (SA) of l attributes.
\(grid\_list\) consist of attributes.
References
Aafer Y, Du W, Yin H (2013) Droidapiminer: mining api-level features for robust malware detection in android. In: International conference on security and privacy in communication systems, Springer, pp 86–103
Abawajy J, Kelarev A (2017) Iterative classifier fusion system for the detection of android malware. IEEE Transactions on Big Data
Alam MS, Vuong ST (2013) Random forest classification for detecting android malware. In: 2013 IEEE international conference on green computing and communications and IEEE Internet of Things and IEEE cyber, physical and social computing, IEEE, pp 663–669
Alazab M, Alazab M, Shalaginov A, Mesleh A, Awajan A (2020) Intelligent mobile malware detection using permission requests and API calls. Future Gener Comput Syst 107:509–521
Almin SB, Chatterjee M (2015) A novel approach to detect android malware. Procedia Comput Sci 45:407–417
Alzaylaee MK, Yerima SY, Sezer S (2020) DL-droid: deep learning based android malware detection using real devices. Comput Secur 89:101663
Amos B, Turner H, White J (2013) Applying machine learning classifiers to dynamic android malware detection at scale. In: 2013 9th international wireless communications and mobile computing conference (IWCMC), IEEE, pp 1666–1671
Andriatsimandefitra R, Tong VVT (2015) Detection and identification of android malware based on information flow monitoring. In: 2015 IEEE 2nd international conference on cyber security and cloud computing, IEEE, pp 200–203
Arora A, Peddoju SK, Conti M (2019) Permpair: Android malware detection using permission pairs. IEEE Trans Inf Forensics Secur 15:1968–1982
Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens C (2014) Drebin: effective and explainable detection of android malware in your pocket. NDSS 14:23–26
Attar AE, Khatoun R, Lemercier M (2014) A gaussian mixture model for dynamic detection of abnormal behavior in smartphone applications. In: 2014 global information infrastructure and networking symposium (GIIS), IEEE, pp 1–6
Babaagba KO, Adesanya SO (2019) A study on the effect of feature selection on malware analysis using machine learning. In: Proceedings of the 2019 8th international conference on educational and information technology, pp 51–55
Barrera D, Kayacik HG, Oorschot PCV, Somayaji A (2010) A methodology for empirical analysis of permission-based security models and its application to android. In: Proceedings of the 17th ACM conference on computer and communications security, pp 73–84
Bibi KF, Banu MN (2015) Feature subset selection based on filter technique. In: 2015 international conference on computing and communications technologies (ICCCT), IEEE, pp 1–6
Birendra C (2016) Android permission model. arXiv preprint arXiv:160704256
Blair DC (1979) Information retrieval, 2nd ed. C. J. van Rijsbergen. J Am Soc Inf Sci 30(6):374–375. https://doi.org/10.1002/asi.4630300621. https://ideas.repec.org/a/bla/jamest/v30y1979i6p374-375.html
Blessie EC, Karthikeyan E (2012) Sigmis: a feature selection algorithm using correlation based method. J Algorithms Comput Technol 6(3):385–394
Burguera I, Zurutuza U, Nadjm-Tehrani S (2011) Crowdroid: behavior-based malware detection system for android. In: Proceedings of the 1st ACM workshop on security and privacy in smartphones and mobile devices, pp 15–26
Cai H, Meng N, Ryder B, Yao D (2018) Droidcat: effective android malware detection and categorization via app-level profiling. IEEE Trans Inf Forensics Secur 14(6):1455–1470
Canbek G, Baykal N, Sagiroglu S (2017) Clustering and visualization of mobile application permissions for end users and malware analysts. In: 2017 5th international symposium on digital forensic and security (ISDFS), IEEE, pp 1–10
Caviglione L, Gaggero M, Lalande JF, Mazurczyk W, Urbański M (2015) Seeing the unseen: revealing mobile malware hidden communications via energy consumption and artificial intelligence. IEEE Trans Inf Forensics Secur 11(4):799–810
Chaikla N, Qi Y (1999) Genetic algorithms in feature selection. In: IEEE SMC’99 conference proceedings. 1999 IEEE international conference on systems, man, and cybernetics (Cat. No. 99CH37028), IEEE, vol 5, pp 538–540
Chen PS, Lin SC, Sun CH (2015) Simple and effective method for detecting abnormal internet behaviors of mobile devices. Inf Sci 321:193–204
Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–142
Cruz AEC, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: 2009 3rd international symposium on empirical software engineering and measurement, IEEE, pp 460–463
Cui B, Jin H, Carullo G, Liu Z (2015) Service-oriented mobile malware detection system based on mining strategies. Pervas Mobile Comput 24:101–116
Dixon B, Mishra S (2013) Power based malicious code detection techniques for smartphones. In: 2013 12th IEEE international conference on trust, security and privacy in computing and communications, IEEE, pp 142–149
Enck W, Ongtang M, McDaniel P (2009) On lightweight mobile phone application certification. In: Proceedings of the 16th ACM conference on computer and communications security, pp 235–245
Enck W, Gilbert P, Han S, Tendulkar V, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2014) Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans Comput Syst (TOCS) 32(2):1–29
Faruki P, Ganmoor V, Laxmi V, Gaur MS, Bharmal A (2013) Androsimilar: robust statistical feature signature for android malware detection. In: Proceedings of the 6th international conference on security of information and networks, pp 152–159
Fung CJ, Lam DY, Boutaba R (2014) Revmatch: An efficient and robust decision model for collaborative malware detection. In: 2014 IEEE network operations and management symposium (NOMS), IEEE, pp 1–9
Guo DF, Sui AF, Shi YJ, Hu JJ, Lin GZ, Guo T (2014) Behavior classification based self-learning mobile malware detection. JCP 9(4):851–858
Han W, Xue J, Wang Y, Liu Z, Kong Z (2019) Malinsight: a systematic profiling based malware detection framework. J Netw Comput Appl 125:236–250
Holland B, Deering T, Kothari S, Mathews J, Ranade N (2015) Security toolbox for detecting novel and sophisticated android malware. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, IEEE, vol 2, pp 733–736
Hou S, Ye Y, Song Y, Abdulhayoglu M (2017) Hindroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1507–1515
Jerbi M, Dagdia ZC, Bechikh S, Said LB (2020) On the use of artificial malicious patterns for android malware detection. Comput Secur 92:101743
Jouve PE, Nicoloyannis N (2005) A filter feature selection method for clustering. In: International symposium on methodologies for intelligent systems, Springer, pp 583–593
Kadir AFA, Stakhanova N, Ghorbani AA (2015) Android botnets: What URLs are telling us. In: International conference on network and system security, Springer, pp 78–91
Karbab EB, Debbabi M, Derhab A, Mouheb D (2018) Maldozer: automatic framework for android malware detection using deep learning. Digit Investig 24:S48–S59
Kohavi R, John GH et al (1997) Wrappers for feature subset selection. Artificial intelligence 97(1–2):273–324
Kumar M, et al. (2013) An optimized farthest first clustering algorithm. In: 2013 Nirma University international conference on engineering (NUiCONE), IEEE, pp 1–5
Lee WY, Saxe J, Harang R (2019) Seqdroid: obfuscated android malware detection using stacked convolutional and recurrent neural networks. In: Deep Learning applications for cyber security, Springer, pp 197–210
Lindorfer M, Neugschwandtner M, Weichselbaum L, Fratantonio Y, Veen VVD, Platzer C (2014) Andrubis–1,000,000 apps later: a view on current android malware behaviors. In: 2014 third international workshop on building analysis datasets and gathering experience returns for security (BADGERS), IEEE, pp 3–17
Ma Z, Ge H, Liu Y, Zhao M, Ma J (2019) A combination method for android malware detection based on control flow graphs and machine learning algorithms. IEEE Access 7:21235–21245
Mahindru A, Sangal A (2019) Deepdroid: feature selection approach to detect android malware using deep learning. In: 2019 IEEE 10th international conference on software engineering and service science (ICSESS), IEEE, pp 16–19
Mahindru A, Sangal A (2020a) Feature-based semi-supervised learning to detect malware from android. Automated software engineering: a deep learning-based approach. Springer, Berlin, pp 93–118
Mahindru A, Sangal A (2020b) Feature-based semi-supervised learning to detect malware from android. Automated software engineering: a deep learning-based approach. Springer, Berlin, pp 93–118
Mahindru A, Sangal A (2020a) Gadroid: a framework for malware detection from android by using genetic algorithm as feature selection approach. Int J Adv Sci Technol 29(5):5532–5543
Mahindru A, Sangal A (2020b) Perbdroid: effective malware detection model developed using machine learning classification techniques. A journey towards bio-inspired techniques in software engineering. Springer, Berlin, pp 103–139
Mahindru A, Singh P (2017) Dynamic permissions based android malware detection using machine learning techniques. In: Proceedings of the 10th innovations in software engineering conference, pp 202–210
Martinelli F, Mercaldo F, Saracino A (2017) Bridemaid: an hybrid tool for accurate detection of android malware. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp 899–901
Milosevic N, Dehghantanha A, Choo KKR (2017) Machine learning aided android malware classification. Comput Electr Eng 61:266–274
Narudin FA, Feizollah A, Anuar NB, Gani A (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput 20(1):343–357
Ng DV, Hwang JIG (2014) Android malware detection using the dendritic cell algorithm. In: 2014 international conference on machine learning and cybernetics, IEEE, vol 1, pp 257–262
Novakovic J (2010) The impact of feature selection on the accuracy of naïve bayes classifier. In: 18th telecommunications forum TELFOR, vol 2, pp 1113–1116
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Plackett RL (1983) Karl pearson and the chi-squared test. International Statistical Review/Revue Internationale de Statistique 59–72
Portokalidis G, Homburg P, Anagnostakis K, Bos H (2010) Paranoid android: versatile protection for smartphones. In: Proceedings of the 26th annual computer security applications conference, pp 347–356
Quan D, Zhai L, Yang F, Wang P (2014) Detection of android malicious apps based on the sensitive behaviors. In: 2014 IEEE 13th international conference on trust, security and privacy in computing and communications, IEEE, pp 877–883
Rahman M (2013) Droidmln: a markov logic network approach to detect android malware. In: 2013 12th international conference on machine learning and applications, IEEE, vol 2, pp 166–169
Rahman SSMM, Saha SK (2018) Stackdroid: evaluation of a multi-level approach for detecting the malware on android using stacked generalization. In: International conference on recent trends in image processing and pattern recognition, Springer, pp 611–623
Shabtai A, Kanonov U, Elovici Y, Glezer C, Weiss Y (2012) “Andromaly”: a behavioral malware detection framework for android devices. J Intell Inf Syst 38(1):161–190
Sheen S, Anitha R, Natarajan V (2015) Android based malware detection using a multifeature collaborative decision fusion approach. Neurocomputing 151:905–912
Shen T, Zhongyang Y, Xin Z, Mao B, Huang H (2014) Detect android malware variants using component based topology graph. In: 2014 IEEE 13th international conference on trust, security and privacy in computing and communications, IEEE, pp 406–413
Suarez-Tangil G, Tapiador JE, Peris-Lopez P, Pastrana S (2015) Power-aware anomaly detection in smartphones: an analysis of on-platform versus externalized operation. Pervas Mobile Comput 18:137–151
Tam K, Khan SJ, Fattori A, Cavallaro L (2015) Copperdroid: automatic reconstruction of android malware behaviors. In: Ndss
Tong F, Yan Z (2017) A hybrid approach of mobile malware detection in android. J Parallel Distrib Comput 103:22–31
Tramontana E, Verga G (2019) Mitigating privacy-related risks for android users. In: 2019 IEEE 28th international conference on enabling technologies: infrastructure for collaborative enterprises (WETICE), IEEE, pp 243–248
Vinayakumar R, Alazab M, Soman K, Poornachandran P, Venkatraman S (2019) Robust intelligent malware detection using deep learning. IEEE Access 7:46717–46738
Wang W, Wang X, Feng D, Liu J, Han Z, Zhang X (2014) Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans Inf Forensics Secur 9(11):1869–1882
Wang W, Zhao M, Wang J (2019) Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Humaniz Comput 10(8):3035–3043
Wei F, Li Y, Roy S, Ou X, Zhou W (2017) Deep ground truth analysis of current android malware. In: International conference on detection of intrusions and malware, and vulnerability assessment, Springer, pp 252–276
Wei TE, Mao CH, Jeng AB, Lee HM, Wang HT, Wu DJ (2012) Android malware detection via a latent network behavior analysis. In: 2012 IEEE 11th international conference on trust, security and privacy in computing and communications, IEEE, pp 1251–1258
Wu DJ, Mao CH, Wei TE, Lee HM, Wu KP (2012) Droidmat: Android malware detection through manifest and API calls tracing. In: 2012 seventh Asia joint conference on information security, IEEE, pp 62–69
Xiao X, Zhang S, Mercaldo F, Hu G, Sangaiah AK (2019) Android malware detection based on system call sequences and LSTM. Multimed Tools Appl 78(4):3979–3999
Xu R, Saïdi H, Anderson R (2012) Aurasium: practical policy enforcement for android applications. In: Presented as part of the 21st \(\{\)USENIX\(\}\) security symposium (\(\{\)USENIX\(\}\) Security 12), pp 539–552
Yang L, Ganapathy V, Iftode L (2011) Enhancing mobile malware detection with social collaboration. In: 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing, IEEE, pp 572–576
Yewale A, Singh M (2016) Malware detection based on opcode frequency. In: 2016 international conference on advanced communication control and computing technologies (ICACCCT), IEEE, pp 646–649
Yuxin D, Siyi Z (2019) Malware detection based on deep learning algorithm. Neural Comput Appl 31(2):461–472
Zhou Y, Jiang X (2012) Dissecting android malware: characterization and evolution. In: 2012 IEEE symposium on security and privacy, IEEE, pp 95–109
Zhu HJ, Jiang TH, Ma B, You ZH, Shi WL, Cheng L (2018) Hemd: a highly efficient random forest-based malware detection framework for android. Neural Comput Appl 30(11):3353–3361
Zhu HJ, You ZH, Zhu ZX, Shi WL, Chen X, Cheng L (2018b) Droiddet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272:638–646
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mahindru, A., Sangal, A.L. SemiDroid: a behavioral malware detector based on unsupervised machine learning techniques using feature selection approaches. Int. J. Mach. Learn. & Cyber. 12, 1369–1411 (2021). https://doi.org/10.1007/s13042-020-01238-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01238-9