Abstract
With the recognition of free apps, Android has become the most widely used smartphone operating system these days and it naturally invited cyber-criminals to build malware-infected apps that can steal vital information from these devices. The most critical problem is to detect malware-infected apps and keep them out of Google play store. The vulnerability lies in the underlying permission model of Android apps. Consequently, it has become the responsibility of the app developers to precisely specify the permissions which are going to be demanded by the apps during their installation and execution time. In this study, we examine the permission-induced risk which begins by giving unnecessary permissions to these Android apps. The experimental work done in this research paper includes the development of an effective malware detection system which helps to determine and investigate the detective influence of numerous well-known and broadly used set of features for malware detection. To select best features from our collected features data set we implement ten distinct feature selection approaches. Further, we developed the malware detection model by utilizing LSSVM (Least Square Support Vector Machine) learning approach connected through three distinct kernel functions i.e., linear, radial basis and polynomial. Experiments were performed by using 2,00,000 distinct Android apps. Empirical result reveals that the model build by utilizing LSSVM with RBF (i.e., radial basis kernel function) named as FSdroid is able to detect 98.8% of malware when compared to distinct anti-virus scanners and also achieved 3% higher detection rate when compared to different frameworks or approaches proposed in the literature.















Similar content being viewed by others
Notes
Malware families are identified by VirusTotal.
In this study, we use the Min-max normalization approach to normalize the data. This approach is based on the principle of linear transformation, which bring each data point \(D_{q_{i}}\) of feature Q to a normalized value \(D_{q_{i}},\) that lie in between 0 − 1. Following equation is considered to find the normalized value of \(D_{q_{i}}:\)
$$Normalized(D_{q_{i}})=\frac{D_{q_{i}}-min(Q)}{max(Q)-min(Q)},$$where min(Q) & max(Q) are the minimum and maximum significance of attribute Q, respectively.
Name of the extracted feature sets are available at url: https://github.com/ArvindMahindru66/Computer-and-security-dataset for reserachers and academicians.
In our study, we fixed the value of T = 3 and d = 5 for performing experiment with both of the linear and polynomial kernel.
In our study, we fixed the value of γ = 10 for performing experiment with RBF kernel.
Performance Parameters are calculated on the basis of training and testing data set.
To perform experiment we collect 1000 distinct Android apps from real-world
References
Aafer Y, Du W, Yin H (2013) Droidapiminer: Mining api-level features for robust malware detection in android. In: International conference on security and privacy in communication systems, Springer, pp 86–103
Allix K, Bissyandé T F, Jérome Q, Klein J, Traon YL, et al. (2016) Empirical assessment of machine learning-based malware detectors for android. Empir Softw Eng 21(1):183–211
Arora A, Peddoju SK, Conti M (2019) Permpair: Android malware detection using permission pairs. IEEE Trans Inform Forensics Secur 15:1968–1982
Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens C (2014) Drebin: Effective and explainable detection of android malware in your pocket. In: Ndss, vol 14, pp 23–26
Aubery-Derrick S (2011) Detection of smart phone malware. Unpublished PhD Thesis Electronic and Information Technology University Berlin, pp 1–211
Azmoodeh A, Dehghantanha A, Choo KKR (2018) Robust malware detection for internet of (battlefield) things devices using deep eigenspace learning. IEEE Trans Sustain Comput 4(1):88–95
Backes M, Gerling S, Hammer C, Maffei M, von Styp-Rekowsky P (2013) Appguard–enforcing user requirements on android apps. In: International conference on TOOLS and Algorithms for the construction and analysis of systems. Springer, pp 543–548
Barrera D, Kayacik HG, Oorschot PCV, Somayaji A (2010) A methodology for empirical analysis of permission-based security models and its application to android. In: Proceedings of the 17th ACM conference on Computer and communications security, pp 73–84
Bhandari S, Gupta R, Laxmi V, Gaur MS, Zemmari A, Anikeev M (2015) Draco: Droid analyst combo an android malware analysis framework. In: Proceedings of the 8th international conference on security of information and networks, pp 283–289
Bhattacharya A, Goswami RT (2018) A hybrid community based rough set feature selection technique in android malware detection, Springer
Birendra C (2016) Android permission model. arXiv:160704256
Bläsing T, Batyuk L, Schmidt AD, Camtepe SA, Albayrak S (2010) An android application sandbox system for suspicious software detection. In: 2010 5th International conference on malicious and unwanted software, IEEE, pp 55–62
Bugiel S, Davi L, Dmitrienko A, Fischer T, Sadeghi AR, Shastry B (2012) Towards taming privilege-escalation attacks on android. In: NDSS, vol 17, p 19
Burguera I, Zurutuza U, Nadjm-Tehrani S (2011) Crowdroid: behavior-based malware detection system for android. In: Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices, pp 15–26
Cai H, Meng N, Ryder B, Yao D (2018) Droidcat: Effective android malware detection and categorization via app-level profiling. IEEE Trans Inf Forensics Secur 14(6):1455–1470
Chaikla N, Qi Y (1999) Genetic algorithms in feature selection. In: IEEE SMC’99 conference proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No. 99CH37028), IEEE, vol 5, pp 538–540
Chakradeo S, Reaves B, Traynor P, Enck W (2013) Mast: Triage for market-scale mobile malware analysis. In: Proceedings of the sixth ACM conference on Security and privacy in wireless and mobile networks, pp 13–24
Chen KZ, Johnson NM, D’Silva V, Dai S, MacNamara K, Magrino TR, Wu EX, Rinard M, Song DX (2013) Contextual policy enforcement in android applications with permission event graphs. In: NDSS, p 234
Chen S, Xue M, Fan L, Hao S, Xu L, Zhu H, Li B (2018) Automated poisoning attacks and defenses in malware detection systems:, An adversarial machine learning approach. Comput Secur 73:326–344
Chen Y, Xiong J, Xu W, Zuo J (2019) A novel online incremental and decremental learning algorithm based on variable support vector machine. Clust Comput 22(3):7435–7445
Cruz AEC, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: 2009 3rd International Symposium on Empirical Software Engineering and Measurement, IEEE, pp 460–463
Desnos A, et al. (2013) Androguard-reverse engineering, malware and goodware analysis of android applications. URL code google com/p/androguard 153
DeviPriya K, Lingamgunta S (2020) Multi factor two-way hash-based authentication in cloud computing. Int J Cloud Appl Comput (IJCAC) 10(2):56–76
Dini G, Martinelli F, Saracino A, Sgandurra D (2012) Madam: a multi-level anomaly detector for android malware. In: International conference on mathematical methods, models, and architectures for computer network security. Springer, pp 240–253
Enck W, Ongtang M, McDaniel P (2009) On lightweight mobile phone application certification. In: Proceedings of the 16th ACM conference on Computer and communications security, pp 235–245
Enck W, Gilbert P, Han S, Tendulkar V, Chun BG, Cox LP, Jung J, McDaniel P, Sheth AN (2014) Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans Comput Syst (TOCS) 32(2):1–29
Faruki P, Ganmoor V, Laxmi V, Gaur MS, Bharmal A (2013) Androsimilar: robust statistical feature signature for android malware detection. In: Proceedings of the 6th International conference on security of information and networks, pp 152–159
Faruki P, Bharmal A, Laxmi V, Ganmoor V, Gaur MS, Conti M, Rajarajan M (2014) Android security: a survey of issues, malware penetration, and defenses. IEEE Commun Surv Tutor 17(2):998–1022
Felt AP, Ha E, Egelman S, Haney A, Chin E, Wagner D (2012) Android permissions: User attention, comprehension, and behavior. In: Proceedings of the eighth symposium on usable privacy and security, pp 1–14
Fereidooni H, Conti M, Yao D, Sperduti A (2016) Anastasia: Android malware detection using static analysis of applications. In: 2016 8th IFIP international conference on new technologies, mobility and security (NTMS). IEEE, pp 1–5
Fuchs AP, Chaudhuri A, Foster JS (2009) Scandroid: Automated security certification of android applications. Manuscript, Univ of Maryland, http://www.csumdedu/avik/projects/scandroidascaa 2(3)
Gadekallu TR, Rajput DS, Reddy MPK, Lakshmanna K, Bhattacharya S, Singh S, Jolfaei A, Alazab M (2020) A novel pca–whale optimization-based deep neural network model for classification of tomato plant diseases using gpu. J. Real-Time Image Proc., 1–14
Gao K, Khoshgoftaar TM, Napolitano A (2009) Exploring software quality classification with a wrapper-based feature ranking technique. In: 2009 21st IEEE international conference on tools with artificial intelligence, IEEE, pp 67–74
Grace M, Zhou Y, Zhang Q, Zou S, Jiang X (2012a) Riskranker: scalable and accurate zero-day android malware detection. In: Proceedings of the 10th international conference on Mobile systems, applications, and services, pp 281–294
Grace MC, Zhou Y, Wang Z, Jiang X (2012b) Systematic detection of capability leaks in stock android smartphones. In: NDSS, vol 14, p 19
Gupta BB, Perez GM, Agrawal DP, Gupta D (2020) Handbook of computer networks and cyber security. Springer, Berlin
Han W, Xue J, Wang Y, Liu Z, Kong Z (2019) Malinsight: A systematic profiling based malware detection framework. J Netw Comput Appl 125:236–250
He S, Li Z, Tang Y, Liao Z, Li F, Lim SJ (2020) Parameters compressing in deep learning. Comput Mater Contin 62(1):321–336
Hou S, Ye Y, Song Y, Abdulhayoglu M (2017) Hindroid: An intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of the 23rd ACM SIGKDD International conference on knowledge discovery and data mining, pp 1507–1515
Jeon J, Micinski KK, Vaughan JA, Fogel A, Reddy N, Foster JS, Millstein T (2012) Dr. android and mr. hide: fine-grained permissions in android applications. In: Proceedings of the second ACM workshop on Security and privacy in smartphones and mobile devices, pp 3–14
Jerlin MA, Marimuthu K (2018) A new malware detection system using machine learning techniques for api call sequences. Journal of Applied Security Research 13(1):45–62
Jiang S, Chen W, Li Z, Yu H (2019) Short-term demand prediction method for online car-hailing services based on a least squares support vector machine. IEEE Access 7:11882–11891
Kadir AFA, Stakhanova N, Ghorbani AA (2015) Android botnets: What urls are telling us. In: International conference on network and system security, Springer, pp 78–91
Karbab EB, Debbabi M, Derhab A, Mouheb D (2018) Maldozer: Automatic framework for android malware detection using deep learning. Digit Investig 24:S48–S59
Khare N, Devan P, Chowdhary CL, Bhattacharya S, Singh G, Singh S, Yoon B (2020) Smo-dnn: Spider monkey optimization and deep neural network hybrid classifier model for intrusion detection. Electronics 9(4):692
Kirubavathi G, Anitha R (2018) Structural analysis and detection of android botnets using machine learning techniques. Int J Inf Secur 17(2):153–167
Kohavi R, John GH, et al. (1997) Wrappers for feature subset selection. Artificial intelligence 97(1-2):273–324
Kumar L, Sripada SK, Sureka A, Rath SK (2018) Effective fault prediction model developed using least square support vector machine (lssvm). J Syst Softw 137:686–712
Letteri I, Penna GD, Gasperis GD (2019) Security in the internet of things: botnet detection in software-defined networks by deep learning techniques. Int J High Perform Comput Netw 15(3-4):170–182
Li L, Bissyandé T F, Papadakis M, Rasthofer S, Bartel A, Octeau D, Klein J, Traon L (2017) Static analysis of android apps: A systematic literature review. Inf Softw Technol 88:67–95
Lindorfer M, Neugschwandtner M, Weichselbaum L, Fratantonio Y, Veen VVD, Platzer C (2014) Andrubis–1,000,000 apps later: A view on current android malware behaviors. In: 2014 third international workshop on building analysis datasets and gathering experience returns for security (BADGERS). IEEE, pp 3–17
Loorak MH, Fong PW, Carpendale S (2014) Papilio: Visualizing android application permissions. In: Computer graphics forum, Wiley Online Library, vol 33, pp 391–400
Ma Z, Ge H, Liu Y, Zhao M, Ma J (2019) A combination method for android malware detection based on control flow graphs and machine learning algorithms. IEEE Access 7:21235–21245
Mahindru A, Sangal A (2019) Deepdroid: Feature selection approach to detect android malware using deep learning, IEEE
Mahindru A, Sangal A (2020a) Dldroid: Feature selection based malware detection framework for android apps developed during covid-19. Int J Emerg Technol
Mahindru A, Sangal A (2020b) Feature-based semi-supervised learning to detect malware from android. In: Automated software engineering: a deep learning-based approach, Springer, pp 93–118
Mahindru A, Sangal A (2020c) Gadroid: A framework for malware detection from android by using genetic algorithm as feature selection approach. Int J Adv Sci Technol 29(5):5532–5543
Mahindru A, Sangal A (2020d) Parudroid: Validation of android malware detection dataset. J Cybersecur Inform Manag 3(2):42–52
Mahindru A, Sangal A (2020e) Perbdroid: Effective malware detection model developed using machine learning classification techniques. In: A journey towards bio-inspired techniques in software engineering, Springer, pp 103–139
Mahindru A, Singh P (2017) Dynamic permissions based android malware detection using machine learning techniques. In: Proceedings of the 10th innovations in software engineering conference, pp 202–210
Matsudo T, Kodama E, Wang J, Takata T (2012) A proposal of security advisory system at the time of the installation of applications on android os. In: 2012 15th International conference on network-based information systems, IEEE, pp 261–267
Narayanan A, Chandramohan M, Chen L, Liu Y (2018) A multi-view context-aware approach to android malware detection and malicious code localization. Empir Softw Eng 23(3):1222–1274
Narudin FA, Feizollah A, Anuar NB, Gani A (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Comput 20(1):343–357
Novakovic J (2010) The impact of feature selection on the accuracy of naïve bayes classifier. In: 18th Telecommunications forum TELFOR, vol 2, pp 1113–1116
Ongtang M, McLaughlin S, Enck W, McDaniel P (2012) Semantically rich application-centric security in android. Secur Commun Netw 5(6):658–673
Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(5):341–356
Peiravian N, Zhu X (2013) Machine learning for android malware detection using permission and api calls. In: 2013 IEEE 25th international conference on tools with artificial intelligence. IEEE, pp 300–305
Petsas T, Voyatzis G, Athanasopoulos E, Polychronakis M, Ioannidis S (2014) Rage against the virtual machine: hindering dynamic analysis of android malware. In: Proceedings of the seventh european workshop on system security, pp 1–6
Plackett RL (1983) Karl pearson and the chi-squared test. In: International statistical review/Revue Internationale de Statistique, pp 59–72
Portokalidis G, Homburg P, Anagnostakis K, Bos H (2010) Paranoid android: versatile protection for smartphones. In: Proceedings of the 26th annual computer security applications conference, pp 347–356
Rastogi V, Chen Y, Enck W (2013) Appsplayground: automatic security analysis of smartphone applications. In: Proceedings of the third ACM conference on Data and application security and privacy, pp 209–220
Razak MFA, Anuar NB, Othman F, Firdaus A, Afifi F, Salleh R (2018) Bio-inspired for features optimization and malware detection. Arab J Sci Eng 43(12):6963–6979
Rosen S, Qian Z, Mao ZM (2013) Appprofiler: a flexible method of exposing privacy-related behavior in android applications to end users. In: Proceedings of the third ACM conference on Data and application security and privacy, pp 221–232
Sanz B, Santos I, Laorden C, Ugarte-Pedrero X, Bringas PG, Álvarez G (2013) Puma: Permission usage to detect malware in android. In: International joint conference CISIS’12-ICEUTE 12-SOCO 12, Special Sessions, Springer, pp 289–298
Saracino A, Sgandurra D, Dini G, Martinelli F (2016) Madam: Effective and efficient behavior-based android malware detection and prevention. IEEE Trans Dependable Secure Comput 15(1):83–97
Shabtai A, Kanonov U, Elovici Y, Glezer C, Weiss Y (2012) “andromaly”: a behavioral malware detection framework for android devices, vol 38, pp 161–190
Shahzad F, Akbar M, Khan S, Farooq M (2013) Tstructdroid: Realtime malware detection using in-execution dynamic analysis of kernel process control blocks on android. National University of Computer & Emerging Sciences, Islamabad, Pakistan, Tech Rep
Suykens JA, Brabanter JD, Lukas L, Vandewalle J (2002) Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 48(1-4):85–105
Tam K, Khan SJ, Fattori A, Cavallaro L (2015) Copperdroid: Automatic reconstruction of android malware behaviors. In: Ndss
Tan Y, Xue Y, Liang C, Zheng J, Zhang Q, Zheng J, Li Y (2018) A root privilege management scheme with revocable authorization for android devices. J Netw Comput Appl 107:69–82
Mas’ud MZ, Sahib S, Abdollah MF, Selamat SR, Yusof R (2014) Analysis of features selection and machine learning classifier in android malware detection, IEEE
Wang C, Xu Q, Lin X, Liu S (2019a) Research on data mining of permissions mode for android malware detection. Clust Comput 22(6):13337–13350
Wang D, Romagnoli J (2005) Robust multi-scale principal components analysis with applications to process monitoring. J Process Control 15(8):869–882
Wang W, Wang X, Feng D, Liu J, Han Z, Zhang X (2014) Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans Inf Forensics Secur 9(11):1869–1882
Wang W, Li Y, Wang X, Liu J, Zhang X (2018) Detecting android malicious apps and categorizing benign apps with ensemble of classifiers. Future Gener Comput Syst 78:987–994
Wang W, Zhao M, Wang J (2019b) Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Humaniz Comput 10(8):3035–3043
Wu DJ, Mao CH, Wei TE, Lee HM, Wu KP (2012) Droidmat: Android malware detection through manifest and api calls tracing. In: 2012 Seventh Asia joint conference on information security, IEEE, pp 62–69
Xiao X, Zhang S, Mercaldo F, Hu G, Sangaiah AK (2019) Android malware detection based on system call sequences and lstm. Multimed Tools Appl 78(4):3979–3999
Xu R, Saïdi H, Anderson R (2012) Aurasium: Practical policy enforcement for android applications. In: Presented as part of the 21st {USENIX} security symposium ({USENIX} security, vol 12, pp 539–552
Yamaguchi S, Gupta B (2020) Malware threat in internet of things and its mitigation analysis. In: Security, privacy, and forensics issues in big data. IGI Global, pp 363–379
Yan LK, Yin H (2012) Droidscope: Seamlessly reconstructing the {OS} and dalvik semantic views for dynamic android malware analysis. In: Presented as part of the 21st {USENIX} security symposium ({USENIX} security, vol 12, pp 56–584
Yerima SY, Sezer S, McWilliams G, Muttik I (2013) A new android malware detection approach using bayesian classification, IEEE, AINA
Yerima SY, Sezer S, McWilliams G (2014) Analysis of bayesian classification-based approaches for android malware detection. IET Inf Secur 8(1):25–36
Zhang LB, Peng F, Qin L, Long M (2018) Face spoofing detection based on color texture markov feature and support vector machine recursive feature elimination. J Vis Commun Image Represent 51:56–69
Zheng C, Zhu S, Dai S, Gu G, Gong X, Han X, Zou W (2012) Smartdroid: an automatic system for revealing ui-based trigger conditions in android applications. In: Proceedings of the second ACM workshop on Security and privacy in smartphones and mobile devices, pp 93–104
Zhou S, Tan B (2020) Electrocardiogram soft computing using hybrid deep learning cnn-elm. Appl Soft Comput 86:105778
Zhou W, Zhou Y, Jiang X, Ning P (2012) Detecting repackaged smartphone applications in third-party android marketplaces. In: Proceedings of the second ACM conference on data and application security and privacy, pp 317–326
Zhou Y, Jiang X (2012) Dissecting android malware: Characterization and evolution. In: 2012 IEEE symposium on security and privacy, IEEE, pp 95–109
Zhu HJ, Jiang TH, Ma B, You ZH, Shi WL, Cheng L (2018a) Hemd: a highly efficient random forest-based malware detection framework for android. Neural Comput and Applic 30(11):3353–3361
Zhu HJ, You ZH, Zhu ZX, Shi WL, Chen X, Cheng L (2018b) Droiddet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272:638–646
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mahindru, A., Sangal, A. FSDroid:- A feature selection technique to detect malware from Android using Machine Learning Techniques. Multimed Tools Appl 80, 13271–13323 (2021). https://doi.org/10.1007/s11042-020-10367-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10367-w