Abstract
The set of permissions required by any Android app during installation time is considered as the feature set which are used in permission based detection of Android malwares. Those high dimensional feature set should be reduced to minimize computational overhead by choosing an optimal sub set of features. In recent times, selection of meaningful attributes is an inevitable step for mining of large dimensional data and the application of heuristic feature selection algorithms are the main research directions in this field. “Quality of classification” measure is inspired by rough set theory and can be combined with bio inspired heuristic search techniques (Particle swarm optimization, Genetic Algorithm etc.) in selecting optimal or near optimal subsets of features. In this work, a feature selection technique based on rough set and improvised particle swarm optimization (PSO) algorithm is proposed for selection of features in the permission based detection of Android malwares. The main contribution of this work is to recommend a new random key encoding method which is used in the proposed work (PSORS-FS) to convert classical PSO algorithm in discrete domain. It also reduces the issues related to maximum velocity of particles as well as sigmoid function which is related with binary PSO. PSORS-FS ensures diversity in the search process and it also reduces the tendency of premature convergence. Datasets of UCI, KEEL machine learning repository and two Android permission datasets have been used to evaluate the performance of the proposed method. Better classification performance has been yielded by proposed method over conventional filters and wrapper methods for most of the machine learning classifiers.







Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Androguard Project in Google Code Archive (2017) https://code.google.com/p/androguard. Accessed 27th April 2017
Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens C (2014) DREBIN: effective and explainable detection of android malware in your pocket. https://www.internetsociety.org/foc/drebin-effective-and-explainable-detection-android-malware-you-pocket. Accessed 29th April 2017
Aswini AM, Vinod P (2014) Android malware analysis using ensemble features. Security, privacy, and applied cryptography engineering lecture. Notes Comput Sci 8804:303–318
Aswini AM, Vinod P (2014) Droid permission miner: mining prominent permissions for android malware analysis. In: Proceedings of the 5th international conference on the applications of the digital information and web technologies. https://doi.org/10.1109/ICADIWAT.2014.6814679
Azevedo G, Cavalcanti G, Filho E (2007) An approach to feature selection for keystroke dynamics systems based on PSO and feature weighting. In: Proceedings of IEEE congress on evolutionary computation, pp 3577–3584. https://doi.org/10.1007/978-3-319-13563-2-51
Bazan J, Nguyen HS, Nguyen SH, Synak P, Wroblewski J (2000) Rough set algorithms in classification problem. https://doi.org/10.1007/978-3-7908-1840-6$43
Bello R, Gómez Y, Caballero Y, Nowe A, Falcón R (2009) Rough sets and evolutionary computation to solve the feature selection problem. https://doi.org/10.1007/978-3-540-89921-1$49
Bhattacharya A, Goswami RT (2016) DMDAM: data mining based detection of android malware. In: Mandal J, Satapathy S, Sanyal M, Bhateja V (eds) Proceedings of the first international conference on intelligent computing and communication. Advances in intelligent systems and computing, vol 458. Springer, Singapore, pp 187–194
Bhattacharya A, Goswami RT (2016) Comparative analysis of different feature ranking techniques in data mining based android malware detection. In: Satapathy S, Bhateja V, Udgata S, Pattnaik P (eds) Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications. Advances in intelligent systems and computing, vol 515. Springer, Singapore
Cervante L, Xue B, Shang L, Zhang M (2013) Binary particle swarm optimisation and rough set theory for dimension reduction in classification, Evolutionary Computation (CEC). Computation Series. Morgan Kaufman, San Francisco
Contagiodump Mobile Dump (2017). http://contagiodump.blogspot.in. Accessed 1 Apr 2017
Crussel J, Gibler C, Chen H (2012) AnDarwin: scalable detection of semantically similar android applications. In: Proceedings of the European symposium on research in computer security. Springer, pp 182–199
Engelbrecht AP (2007) Computational intelligence: an introduction, 2nd edn. Wiley, New York
Felt AP, Chin ME, Hanna S, Wagner D (2011) A survey of mobile malware in the wild. In: Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices (SPSM ’11). ACM, New York, NY, USA, pp 3–14. https://doi.org/10.1145/2046614.2046618
Hassanien AE (2004) Rough set approach for attribute reduction and rule generation: a case of patients with suspected breast cancer. J Am Soc Inform Sci Technol 55:954–962
Hassanien AE, Gaber T, Mokhtar U, Hefny H (2017) An improved moth flame optimization algorithm based on rough sets for tomato diseases detection. Comput Electron Agric 136:86–96
Hassanien (2003) Intelligent data analysis of breast cancer based on rough set theory. Int J Artif Intell Tools 12:465–479
Hassanien AE, Emary E, Yamany W (2014) New approach for feature selection based on rough set and bat algorithm. In: 9th International conference on computer engineering & systems (ICCES). IEEE, pp 346–353
Hassanien AE, Tolba M, Azar AT (2014) Advanced machine learning technologies and applications. Communications in computer and information science, vol 488. Springer-Verlag GmbH, Berlin/Heidelberg (ISBN: 978-3-319-13460-4)
http://virusshare.com. Accessed 10th April 2017
Hu QH, Yu DR, Xie ZX (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27:414–423
Huang CL, Dun JF (2008) A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Appl Soft Comput 8:1381–1391
Huang CY, Tsai YT, Hsu CH (2013) Performance evaluation on permission-based detection for android malware. In: Pan JS, Yang CN, Lin CC (eds) Advances in intelligent systems and applications—volume 2. Smart innovation, systems and technologies, vol 21. Springer, Berlin, Heidelberg
Inbarani HH, Azar AT, Jothi G (2014) Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput Methods Program Biomed 113:175–185
Jensen R, Shen Q (2003) Finding rough set reducts with ant colony optimization. In: Proceedings of the 2003 UK workshop on computational intelligence, pp 15–22
Jensen R, Shen Q (2008) Interval-valued fuzzy-rough feature selection and application for handling missing values in datasets. In: Proceedings of the 8th annual UK workshop on computational intelligence (UKCI’08), pp 59–64. https://doi.org/10.1109/FUZZY.2009.5277289
Juniper Networks: Third Annual Mobile Threats Report (2013) http://www.juniper.net/us/en/local/pdf/additional-resources/jnpr-2012-mobile-threats-report.pdf. Accessed 10 Apr 2017
Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Proceedings of IEEE international conference on systems, man, and cybernetics, computational cybernetics and simulation, pp 4104–4109. https://doi.org/10.1109/ICSMC.1997.637339
Kennedy J, Eberhart RC, Shi Y (2001) Swarm Intelligence. Morgan Kaufmann (ISBN: 9780080518268)
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inform Process Manag 42:155–165
Li Z, Shi K, Dey N, Ashour AS, Wang D, Balas VE, McCauley P, Fuqian S (2017) Rule-based back propagation neural networks for various precision rough set presented KANSEI knowledge prediction: a case study on shoe product form features extraction. Neural Comput Appl 28:613–630
Lin SW, Chen SC (2009) Psolda: a particle swarm optimization approach for enhancing classification accuracy rate of linear discriminant analysis. Appl Soft Comput 9:1008–1015
Liu Y, Wang G, Chen H, Dong H, Zhu X, Wang S (2011) An improved particle swarm optimization for feature selection. J Bionic Eng 8:191–200
Luan XY, Li ZP, Liu TZ (2016) A novel attribute reduction algorithm based on rough set and improved artificial fish swarm algorithm. Neurocomputing. https://doi.org/10.1016/j.neucom.2015.06.090
Marinakis Y, Marinaki M, Dounias G (2008) Particle swarm optimization for pap-smear diagnosis. Expert Syst Appl 35:1645–1656
Mohemmed A, Zhang M, Johnston M (2009) Particle swarm optimization based adaboost for face detection. In: IEEE congress on evolutionary computation, Trondheim, pp 2494–2501. https://doi.org/10.1109/CEC.2009.4983254
Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11:341–356 (pp. 2428–2435)
Ripon SH, Kamal S, Hossain S, Dey N (2016) Theoretical analysis of different classifiers under reduction rough data set: a brief proposal. Int J Rough Sets Data Analysis (IJRSDA) 3:1–20
Sanz B, Santos I, Pedrero XU, Nieves CJ, Bringas PG (2013) Instance-based anomaly method for android malware detection. SECRYPT SciTePress, Vienna, pp 387–394
Shi Y, Eberhart RC (1999) Empirical study of particle swarm optimization. In: Proceedings of the IEEE congress on evolutionary computation. IEEE Press, pp 1945–1950
Skowron A, Bazan J, Son NH, Wroblewski J (2005) RSES 2.2 user’s guide. Institute of Mathematics
Suguna N, Thanushkodi K (2010) A novel rough set reduct algorithm for medical domain based on bee colony optimization. CoRR 2:49–54
TrendLabs 2Q (2013) Security roundup. http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/reports/rpt-2q-2013-trendlabs-security-roundup.pdf. Accessed 10 Apr 2017
Unler A, Murat A (2010) A discrete particle swarm optimization method for feature selection in binary Classification problems. Eur J Oper Res 206:528–539
Vieira SM, Mendonça LF, Farinha GJ, Sousa JMC (2012) Metaheuristics for feature selection: In: Application to sepsis outcome prediction, IEEE congress on evolutionary computation, Brisbane, QLD, pp 1–8
Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough set and particle swarm optimization. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2006.09.003
Wang W, Wang X, Feng D, Liu J, Han Z, Zhang X (2014) Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans Inf Forensics Secur 9:1869–1882
Wei Wang’s Home Page (2016) http://infosec.bjtu.edu.cn/wangwei/?page_id=85. Accessed 31 Oct 2016
Weka Toolkit (2017) http://www.cs.waikato.ac.nz/ml/weka/. Accessed 10 Aug 2017
Wing WY, Yeung DS, Firth M, Tsang ECC, Wang XZ (2008) Feature selection using localized generalization error for supervised classification problems using RBFNN. Pattern Recogn 41:3706–3719
Yamany W, Emary E, Hassanien AE, Schaefer G, Zhu SY (2016) An innovative approach for attribute reduction using rough sets and flower pollination optimisation. Procedia Comput Sci. https://doi.org/10.1016/j.procs.2016.08.083
Yang CS, Chuang LY, Ke C-H, Yang C-H (2008) Boolean binary particle swarm optimization for feature selection. In: IEEE congress on evolutionary computation (IEEE world congress on computational intelligence), Hong Kong, pp 2093–2098. https://doi.org/10.1109/CEC.2008.4631076
Yerima SY, Sezer S, McWilliams G, Muttik I (2013) A new android malware detection using Bayesian classification. In: IEEE 27th international conference on advanced information networking and applications (AINA), Barcelona, pp 121–128. https://doi.org/10.1109/AINA.2013.88
Yue B, Yao W, Abraham A, Liu H (2007) A new rough set reduct algorithm based on particle swarm optimization. In: Mira J, Álvarez JR (eds) Bio-inspired modeling of cognitive tasks. IWINAC 2007. Lecture notes in computer science, vol 4527. Springer, Berlin, Heidelberg
Zeng A, Li T, Luo C, Zhang J, Yang Y (2013) A fuzzy rough set approach for incrementally updating approximations in hybrid information systems. In: Ciucci D, Inuiguchi M, Yao Y, Ślęzak D, Wang G (eds) (2013) Rough sets, fuzzy sets, data mining, and granular computing. RSFDGrC. lecture notes in computer science. https://doi.org/10.1007/978-3-642-41218-9_17
Zhang Y, Gong DW (2015) Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 148:150–157. https://doi.org/10.1016/j.neucom.2012.09.049
Zheng M, Lee PP, Lui JC (2013) ADAM: an automatic and extensible platform to stress test android anti-virus systems. Detection of Intrusions and Malwares and Vulnerability Assessment. https://doi.org/10.1007/978-3-642-37300-8_5
Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: IEEE international conference on evolutionary computation proceedings. IEEE world congress on computational intelligence (Cat. No.98TH8360), Anchorage, AK, pp 69–73. https://doi.org/10.1109/ICEC.1998.699146
Zhan ZH, Zhang J, Li Y (2009) Adaptive particle swarm optimization. IEEE Trans Syst Man Cybern Part B-Cybern 39:1362–1381
Yong Z, Gong DW, Zhang WQ (2016) Feature selection of unreliable data using an improved multi-objective PSO algorithm. Neurocomputing 171:1281–1290
Yong Z, Gong DW, Sun XY, Guo YN (2017) A PSO-based multi-objective multi-label feature selection method in classification. Sci Rep. https://doi.org/10.1038/s41598-017-00416-0
Yong Z, Gong DW, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinf 14:64–75
Cervante L, Xue B, Shang L, Zhang M (2013) A multi-objective feature selection approach based on binary pso and rough set theory. In: Middendorf M, Blum C (eds) Evolutionary computation in combinatorial optimization. EvoCOP 2013. Lecture notes in computer science, vol 7832. Springer, Berlin, Heidelberg
Wang C, Hu Q, Wang XZ, Chen D, Qian Y, Dong Z (2017) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2017.2710422
He Y, Xie H, Wong TL, Wang XZ (2018) A novel binary artificial bee colony algorithm for the set-union knapsack problem. Future Gener Comput Syst 78:77–86
Wang XZ, He YL, Dong LC, Zhao HY (2011) Particle swarm optimization for determining fuzzy measures from data. Inf Sci 181(19):4230–4252
Tsang ECC, Yeung DS, Wang XZ (2003) OFFSS: Optimal fuzzy-valued feature subset selection. IEEE Trans Fuzzy Syst 11(2):202–213
Li Z, Shi K, Dey N, Ashour AS, Wang D, Balas VE, McCauley P, Shi F (2017) Rule-based back propagation neural networks for various precision rough set presented KANSEI knowledge prediction: a case study on shoe product form features extraction. Neural Comput Appl 28(3):613–630
Ripon SH, Kama S, Hossain S, Dey N (2016) Theoretical analysis of different classifiers under reduction rough data set: a brief proposal. Int J Rough Sets Data Anal (IJRSDA) 3(3):1–20
Acharjya D, Anitha A (2017) A comparative study of statistical and rough computing models in predictive data analysis. Int J Ambient Comput Intell (IJACI) 8(2):32–51
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bhattacharya, A., Goswami, R.T. & Mukherjee, K. A feature selection technique based on rough set and improvised PSO algorithm (PSORS-FS) for permission based detection of Android malwares. Int. J. Mach. Learn. & Cyber. 10, 1893–1907 (2019). https://doi.org/10.1007/s13042-018-0838-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-018-0838-1