Skip to main content

Advertisement

Log in

A feature selection technique based on rough set and improvised PSO algorithm (PSORS-FS) for permission based detection of Android malwares

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The set of permissions required by any Android app during installation time is considered as the feature set which are used in permission based detection of Android malwares. Those high dimensional feature set should be reduced to minimize computational overhead by choosing an optimal sub set of features. In recent times, selection of meaningful attributes is an inevitable step for mining of large dimensional data and the application of heuristic feature selection algorithms are the main research directions in this field. “Quality of classification” measure is inspired by rough set theory and can be combined with bio inspired heuristic search techniques (Particle swarm optimization, Genetic Algorithm etc.) in selecting optimal or near optimal subsets of features. In this work, a feature selection technique based on rough set and improvised particle swarm optimization (PSO) algorithm is proposed for selection of features in the permission based detection of Android malwares. The main contribution of this work is to recommend a new random key encoding method which is used in the  proposed work (PSORS-FS) to convert classical PSO algorithm in discrete domain. It also reduces the issues related to maximum velocity of particles as well as sigmoid function which is related with binary PSO. PSORS-FS ensures diversity in the search process and it also reduces the tendency of premature convergence. Datasets of UCI, KEEL machine learning repository and two Android permission datasets have been used to evaluate the performance of the proposed method. Better classification performance has been yielded by proposed method over conventional filters and wrapper methods for most of the machine learning classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Androguard Project in Google Code Archive (2017) https://code.google.com/p/androguard. Accessed 27th April 2017

  2. Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K, Siemens C (2014) DREBIN: effective and explainable detection of android malware in your pocket. https://www.internetsociety.org/foc/drebin-effective-and-explainable-detection-android-malware-you-pocket. Accessed 29th April 2017

  3. Aswini AM, Vinod P (2014) Android malware analysis using ensemble features. Security, privacy, and applied cryptography engineering lecture. Notes Comput Sci 8804:303–318

    Article  Google Scholar 

  4. Aswini AM, Vinod P (2014) Droid permission miner: mining prominent permissions for android malware analysis. In: Proceedings of the 5th international conference on the applications of the digital information and web technologies. https://doi.org/10.1109/ICADIWAT.2014.6814679

  5. Azevedo G, Cavalcanti G, Filho E (2007) An approach to feature selection for keystroke dynamics systems based on PSO and feature weighting. In: Proceedings of IEEE congress on evolutionary computation, pp 3577–3584. https://doi.org/10.1007/978-3-319-13563-2-51

  6. Bazan J, Nguyen HS, Nguyen SH, Synak P, Wroblewski J (2000) Rough set algorithms in classification problem. https://doi.org/10.1007/978-3-7908-1840-6$43

  7. Bello R, Gómez Y, Caballero Y, Nowe A, Falcón R (2009) Rough sets and evolutionary computation to solve the feature selection problem. https://doi.org/10.1007/978-3-540-89921-1$49

  8. Bhattacharya A, Goswami RT (2016) DMDAM: data mining based detection of android malware. In: Mandal J, Satapathy S, Sanyal M, Bhateja V (eds) Proceedings of the first international conference on intelligent computing and communication. Advances in intelligent systems and computing, vol 458. Springer, Singapore, pp 187–194

    Chapter  Google Scholar 

  9. Bhattacharya A, Goswami RT (2016) Comparative analysis of different feature ranking techniques in data mining based android malware detection. In: Satapathy S, Bhateja V, Udgata S, Pattnaik P (eds) Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications. Advances in intelligent systems and computing, vol 515. Springer, Singapore

    Google Scholar 

  10. Cervante L, Xue B, Shang L, Zhang M (2013) Binary particle swarm optimisation and rough set theory for dimension reduction in classification, Evolutionary Computation (CEC). Computation Series. Morgan Kaufman, San Francisco

    Google Scholar 

  11. Contagiodump Mobile Dump (2017). http://contagiodump.blogspot.in. Accessed 1 Apr 2017

  12. Crussel J, Gibler C, Chen H (2012) AnDarwin: scalable detection of semantically similar android applications. In: Proceedings of the European symposium on research in computer security. Springer, pp 182–199

  13. Engelbrecht AP (2007) Computational intelligence: an introduction, 2nd edn. Wiley, New York

    Book  Google Scholar 

  14. Felt AP, Chin ME, Hanna S, Wagner D (2011) A survey of mobile malware in the wild. In: Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices (SPSM ’11). ACM, New York, NY, USA, pp 3–14. https://doi.org/10.1145/2046614.2046618

  15. Hassanien AE (2004) Rough set approach for attribute reduction and rule generation: a case of patients with suspected breast cancer. J Am Soc Inform Sci Technol 55:954–962

    Article  Google Scholar 

  16. Hassanien AE, Gaber T, Mokhtar U, Hefny H (2017) An improved moth flame optimization algorithm based on rough sets for tomato diseases detection. Comput Electron Agric 136:86–96

    Article  Google Scholar 

  17. Hassanien (2003) Intelligent data analysis of breast cancer based on rough set theory. Int J Artif Intell Tools 12:465–479

    Article  Google Scholar 

  18. Hassanien AE, Emary E, Yamany W (2014) New approach for feature selection based on rough set and bat algorithm. In: 9th International conference on computer engineering & systems (ICCES). IEEE, pp 346–353

  19. Hassanien AE, Tolba M, Azar AT (2014) Advanced machine learning technologies and applications. Communications in computer and information science, vol 488. Springer-Verlag GmbH, Berlin/Heidelberg (ISBN: 978-3-319-13460-4)

    Google Scholar 

  20. http://virusshare.com. Accessed 10th April 2017

  21. Hu QH, Yu DR, Xie ZX (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27:414–423

    Article  Google Scholar 

  22. Huang CL, Dun JF (2008) A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Appl Soft Comput 8:1381–1391

    Article  Google Scholar 

  23. Huang CY, Tsai YT, Hsu CH (2013) Performance evaluation on permission-based detection for android malware. In: Pan JS, Yang CN, Lin CC (eds) Advances in intelligent systems and applications—volume 2. Smart innovation, systems and technologies, vol 21. Springer, Berlin, Heidelberg

    Google Scholar 

  24. Inbarani HH, Azar AT, Jothi G (2014) Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput Methods Program Biomed 113:175–185

    Article  Google Scholar 

  25. Jensen R, Shen Q (2003) Finding rough set reducts with ant colony optimization. In: Proceedings of the 2003 UK workshop on computational intelligence, pp 15–22

  26. Jensen R, Shen Q (2008) Interval-valued fuzzy-rough feature selection and application for handling missing values in datasets. In: Proceedings of the 8th annual UK workshop on computational intelligence (UKCI’08), pp 59–64. https://doi.org/10.1109/FUZZY.2009.5277289

  27. Juniper Networks: Third Annual Mobile Threats Report (2013) http://www.juniper.net/us/en/local/pdf/additional-resources/jnpr-2012-mobile-threats-report.pdf. Accessed 10 Apr 2017

  28. Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Proceedings of IEEE international conference on systems, man, and cybernetics, computational cybernetics and simulation, pp 4104–4109. https://doi.org/10.1109/ICSMC.1997.637339

  29. Kennedy J, Eberhart RC, Shi Y (2001) Swarm Intelligence. Morgan Kaufmann (ISBN: 9780080518268)

  30. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    Article  MATH  Google Scholar 

  31. Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inform Process Manag 42:155–165

    Article  Google Scholar 

  32. Li Z, Shi K, Dey N, Ashour AS, Wang D, Balas VE, McCauley P, Fuqian S (2017) Rule-based back propagation neural networks for various precision rough set presented KANSEI knowledge prediction: a case study on shoe product form features extraction. Neural Comput Appl 28:613–630

    Article  Google Scholar 

  33. Lin SW, Chen SC (2009) Psolda: a particle swarm optimization approach for enhancing classification accuracy rate of linear discriminant analysis. Appl Soft Comput 9:1008–1015

    Article  Google Scholar 

  34. Liu Y, Wang G, Chen H, Dong H, Zhu X, Wang S (2011) An improved particle swarm optimization for feature selection. J Bionic Eng 8:191–200

    Article  Google Scholar 

  35. Luan XY, Li ZP, Liu TZ (2016) A novel attribute reduction algorithm based on rough set and improved artificial fish swarm algorithm. Neurocomputing. https://doi.org/10.1016/j.neucom.2015.06.090

    Google Scholar 

  36. Marinakis Y, Marinaki M, Dounias G (2008) Particle swarm optimization for pap-smear diagnosis. Expert Syst Appl 35:1645–1656

    Article  Google Scholar 

  37. Mohemmed A, Zhang M, Johnston M (2009) Particle swarm optimization based adaboost for face detection. In: IEEE congress on evolutionary computation, Trondheim, pp 2494–2501. https://doi.org/10.1109/CEC.2009.4983254

  38. Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11:341–356 (pp. 2428–2435)

    Article  MATH  Google Scholar 

  39. Ripon SH, Kamal S, Hossain S, Dey N (2016) Theoretical analysis of different classifiers under reduction rough data set: a brief proposal. Int J Rough Sets Data Analysis (IJRSDA) 3:1–20

    Article  Google Scholar 

  40. Sanz B, Santos I, Pedrero XU, Nieves CJ, Bringas PG (2013) Instance-based anomaly method for android malware detection. SECRYPT SciTePress, Vienna, pp 387–394

    Google Scholar 

  41. Shi Y, Eberhart RC (1999) Empirical study of particle swarm optimization. In: Proceedings of the IEEE congress on evolutionary computation. IEEE Press, pp 1945–1950

  42. Skowron A, Bazan J, Son NH, Wroblewski J (2005) RSES 2.2 user’s guide. Institute of Mathematics

  43. Suguna N, Thanushkodi K (2010) A novel rough set reduct algorithm for medical domain based on bee colony optimization. CoRR 2:49–54

    Google Scholar 

  44. TrendLabs 2Q (2013) Security roundup. http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/reports/rpt-2q-2013-trendlabs-security-roundup.pdf. Accessed 10 Apr 2017

  45. Unler A, Murat A (2010) A discrete particle swarm optimization method for feature selection in binary Classification problems. Eur J Oper Res 206:528–539

    Article  MATH  Google Scholar 

  46. Vieira SM, Mendonça LF, Farinha GJ, Sousa JMC (2012) Metaheuristics for feature selection: In: Application to sepsis outcome prediction, IEEE congress on evolutionary computation, Brisbane, QLD, pp 1–8

  47. Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough set and particle swarm optimization. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2006.09.003

    Google Scholar 

  48. Wang W, Wang X, Feng D, Liu J, Han Z, Zhang X (2014) Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans Inf Forensics Secur 9:1869–1882

    Article  Google Scholar 

  49. Wei Wang’s Home Page (2016) http://infosec.bjtu.edu.cn/wangwei/?page_id=85. Accessed 31 Oct 2016

  50. Weka Toolkit (2017) http://www.cs.waikato.ac.nz/ml/weka/. Accessed 10 Aug 2017

  51. Wing WY, Yeung DS, Firth M, Tsang ECC, Wang XZ (2008) Feature selection using localized generalization error for supervised classification problems using RBFNN. Pattern Recogn 41:3706–3719

    Article  MATH  Google Scholar 

  52. Yamany W, Emary E, Hassanien AE, Schaefer G, Zhu SY (2016) An innovative approach for attribute reduction using rough sets and flower pollination optimisation. Procedia Comput Sci. https://doi.org/10.1016/j.procs.2016.08.083

    Google Scholar 

  53. Yang CS, Chuang LY, Ke C-H, Yang C-H (2008) Boolean binary particle swarm optimization for feature selection. In: IEEE congress on evolutionary computation (IEEE world congress on computational intelligence), Hong Kong, pp 2093–2098. https://doi.org/10.1109/CEC.2008.4631076

  54. Yerima SY, Sezer S, McWilliams G, Muttik I (2013) A new android malware detection using Bayesian classification. In: IEEE 27th international conference on advanced information networking and applications (AINA), Barcelona, pp 121–128. https://doi.org/10.1109/AINA.2013.88

  55. Yue B, Yao W, Abraham A, Liu H (2007) A new rough set reduct algorithm based on particle swarm optimization. In: Mira J, Álvarez JR (eds) Bio-inspired modeling of cognitive tasks. IWINAC 2007. Lecture notes in computer science, vol 4527. Springer, Berlin, Heidelberg

    Chapter  Google Scholar 

  56. Zeng A, Li T, Luo C, Zhang J, Yang Y (2013) A fuzzy rough set approach for incrementally updating approximations in hybrid information systems. In: Ciucci D, Inuiguchi M, Yao Y, Ślęzak D, Wang G (eds) (2013) Rough sets, fuzzy sets, data mining, and granular computing. RSFDGrC. lecture notes in computer science. https://doi.org/10.1007/978-3-642-41218-9_17

  57. Zhang Y, Gong DW (2015) Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 148:150–157. https://doi.org/10.1016/j.neucom.2012.09.049

    Article  Google Scholar 

  58. Zheng M, Lee PP, Lui JC (2013) ADAM: an automatic and extensible platform to stress test android anti-virus systems. Detection of Intrusions and Malwares and Vulnerability Assessment. https://doi.org/10.1007/978-3-642-37300-8_5

  59. Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: IEEE international conference on evolutionary computation proceedings. IEEE world congress on computational intelligence (Cat. No.98TH8360), Anchorage, AK, pp 69–73. https://doi.org/10.1109/ICEC.1998.699146

  60. Zhan ZH, Zhang J, Li Y (2009) Adaptive particle swarm optimization. IEEE Trans Syst Man Cybern Part B-Cybern 39:1362–1381

    Article  Google Scholar 

  61. Yong Z, Gong DW, Zhang WQ (2016) Feature selection of unreliable data using an improved multi-objective PSO algorithm. Neurocomputing 171:1281–1290

    Article  Google Scholar 

  62. Yong Z, Gong DW, Sun XY, Guo YN (2017) A PSO-based multi-objective multi-label feature selection method in classification. Sci Rep. https://doi.org/10.1038/s41598-017-00416-0

    Google Scholar 

  63. Yong Z, Gong DW, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinf 14:64–75

    Article  Google Scholar 

  64. Cervante L, Xue B, Shang L, Zhang M (2013) A multi-objective feature selection approach based on binary pso and rough set theory. In: Middendorf M, Blum C (eds) Evolutionary computation in combinatorial optimization. EvoCOP 2013. Lecture notes in computer science, vol 7832. Springer, Berlin, Heidelberg

    Chapter  Google Scholar 

  65. Wang C, Hu Q, Wang XZ, Chen D, Qian Y, Dong Z (2017) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2017.2710422

    Google Scholar 

  66. He Y, Xie H, Wong TL, Wang XZ (2018) A novel binary artificial bee colony algorithm for the set-union knapsack problem. Future Gener Comput Syst 78:77–86

    Article  Google Scholar 

  67. Wang XZ, He YL, Dong LC, Zhao HY (2011) Particle swarm optimization for determining fuzzy measures from data. Inf Sci 181(19):4230–4252

    Article  MATH  Google Scholar 

  68. Tsang ECC, Yeung DS, Wang XZ (2003) OFFSS: Optimal fuzzy-valued feature subset selection. IEEE Trans Fuzzy Syst 11(2):202–213

    Article  Google Scholar 

  69. Li Z, Shi K, Dey N, Ashour AS, Wang D, Balas VE, McCauley P, Shi F (2017) Rule-based back propagation neural networks for various precision rough set presented KANSEI knowledge prediction: a case study on shoe product form features extraction. Neural Comput Appl 28(3):613–630

    Article  Google Scholar 

  70. Ripon SH, Kama S, Hossain S, Dey N (2016) Theoretical analysis of different classifiers under reduction rough data set: a brief proposal. Int J Rough Sets Data Anal (IJRSDA) 3(3):1–20

    Article  Google Scholar 

  71. Acharjya D, Anitha A (2017) A comparative study of statistical and rough computing models in predictive data analysis. Int J Ambient Comput Intell (IJACI) 8(2):32–51

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhishek Bhattacharya.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhattacharya, A., Goswami, R.T. & Mukherjee, K. A feature selection technique based on rough set and improvised PSO algorithm (PSORS-FS) for permission based detection of Android malwares. Int. J. Mach. Learn. & Cyber. 10, 1893–1907 (2019). https://doi.org/10.1007/s13042-018-0838-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-018-0838-1

Keywords