Skip to main content
Log in

Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The principle of any approach for solving feature selection problem is to find a subset of the original features. Since finding a minimal subset of the features is an NP-hard problem, it is necessary to develop and propose practical and efficient heuristic algorithms. The whale optimization algorithm is a recently developed nature-inspired meta-heuristic optimization algorithm that imitates the hunting behavior of humpback whales to solve continuous optimization problems. In this paper, we propose a novel binary whale optimization algorithm (BWOA) to solve feature selection problem. BWOA is especially desirable and appealing for feature selection problem whenever there is no heuristic information that can lead the search to the optimal minimal subset. Nonetheless, whales can find the best features as they hunt the prey. Rough set theory (RST) is one of the effective algorithms for feature selection. We use RST with BWOA as the first experiment, and in the second experiment, we use a wrapper approach with BWOA on three different classifiers for feature selection. Also, we verify the performance and the effectiveness of the proposed algorithm by performing our experiments using 32 datasets from the UCI machine learning repository and comparing the proposed algorithm with some powerful existing algorithms in the literature. Furthermore, we employ two nonparametric statistical tests, Wilcoxon Signed-Rank test, and Friedman test, at 5% significance level. Our results show that the proposed algorithm can provide an efficient tool to find a minimal subset of the features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. European conference on machine learning. Springer, New York, pp 137–142

    Google Scholar 

  2. Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML ’97. Morgan Kaufmann Publishers Inc., San Francisco, pp 412–420. http://dl.acm.org/citation.cfm?id=645526.657137

  3. Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158

    Google Scholar 

  4. Mitra P, Murthy C, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312

    Google Scholar 

  5. Rui Y, Huang TS, Chang S-F (1999) Image retrieval: current techniques, promising directions, and open issues. J Vis Commun Image Represent 10(1):39–62

    Google Scholar 

  6. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

    Google Scholar 

  7. Model F, Adorjan P, Olek A, Piepenbrock C (2001) Feature selection for dna methylation based cancer classification. Bioinformatics 17(suppl 1):S157–S164

    Google Scholar 

  8. Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1–2):155–176

    MathSciNet  MATH  Google Scholar 

  9. Jensen R (2005) Combining rough and fuzzy sets for feature selection, Ph.D. thesis, Citeseer

  10. Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective, vol 453. Springer, New York

    MATH  Google Scholar 

  11. Somol P, Pudil P, Kittler J (2004) Fast branch & bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Mach Intell 26(7):900–912

    Google Scholar 

  12. Zhong N, Dong J, Ohsuga S (2001) Using rough sets with heuristics for feature selection. J Intell Inf Syst 16(3):199–214

    MATH  Google Scholar 

  13. Lai C, Reinders MJ, Wessels L (2006) Random subspace method for multivariate feature selection. Pattern Recogn Lett 27(10):1067–1076

    Google Scholar 

  14. Modrzejewski M (1993) Feature selection using rough sets theory. European Conference on Machine Learning. Springer, New York, pp 213–226

    Google Scholar 

  15. Neumann J, Schnörr C, Steidl G (2005) Combined svm-based feature selection and classification. Mach Learn 61(1–3):129–150

    MATH  Google Scholar 

  16. Gasca E, Sánchez JS, Alonso R (2006) Eliminating redundancy and irrelevance using a new mlp-based feature selection method. Pattern Recogn 39(2):313–315

    MATH  Google Scholar 

  17. Xie Z-X, Hu Q-H, Yu D-R (2006) Improved feature selection algorithm based on svm and correlation. International symposium on neural networks. Springer, New York, pp 1373–1380

    Google Scholar 

  18. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1–4):131–156

    Google Scholar 

  19. Fodor IK (2002) A survey of dimension reduction techniques, Center for Applied Scientific Computing, Lawrence Livermore National. Laboratory 9:1–18

    Google Scholar 

  20. Neshatian K, Zhang M (2009) Genetic programming for feature subset ranking in binary classification problems. European conference on genetic programming. Springer, New York, pp 121–132

    Google Scholar 

  21. Zhu Z, Ong Y-S, Dash M (2007) Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Syst Man Cybern Part B 37(1):70–76

    Google Scholar 

  22. Huang J, Cai Y, Xu X (2007) A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn Lett 28(13):1825–1844

    Google Scholar 

  23. Chen S-C, Lin S-W, Chou S-Y (2011) Enhancing the classification accuracy by scatter-search-based ensemble approach. Appl Soft Comput 11(1):1021–1028

    Google Scholar 

  24. Jue W, Qi Z, Hedar A, Ibrahim AM (2014) A rough set approach to feature selection based on scatter search metaheuristic. J Syst Sci Complex 27(1):157–168. https://doi.org/10.1007/s11424-014-3298-z

    Article  MathSciNet  MATH  Google Scholar 

  25. Lin S-W, Lee Z-J, Chen S-C, Tseng T-Y (2008) Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl Soft Comput 8(4):1505–1512

    Google Scholar 

  26. Hedar A-R, Ibrahim A-MM, Abdel-Hakim AE, Sewisy AA (2018) Modulated clustering using integrated rough sets and scatter search attribute reduction. In: Proceedings of the genetic and evolutionary computation conference companion, GECCO ’18. ACM, New York, pp 1394–1401. https://doi.org/10.1145/3205651.3208286

  27. Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123

    Google Scholar 

  28. Yusta SC (2009) Different metaheuristic strategies to solve the feature selection problem. Pattern Recogn Lett 30(5):525–534

    Google Scholar 

  29. Hedar A, Wang J, Fukushima M (2008) Tabu search for attribute reduction in rough set theory. Soft Comput 12(9):909–918

    MATH  Google Scholar 

  30. Al-Ani A, Alsukker A, Khushaba RN (2013) Feature subset selection using differential evolution and a wheel based search strategy. Swarm Evol Comput 9:15–26

    Google Scholar 

  31. Khushaba RN, Al-Ani A, Al-Jumaily A (2011) Feature subset selection using differential evolution and a statistical repair mechanism. Expert Syst Appl 38(9):11515–11526

    Google Scholar 

  32. Rodrigues D, Pereira LA, Nakamura RY, Costa KA, Yang X-S, Souza AN, Papa JP (2014) A wrapper approach for feature selection based on bat algorithm and optimum-path forest. Expert Syst Appl 41(5):2250–2258

    Google Scholar 

  33. Yazdani S, Shanbehzadeh J, Aminian E (2013) Feature subset selection using constrained binary/integer biogeography-based optimization. ISA Transa 52(3):383–390. 10.1016/j.isatra.2012.12.005. http://www.sciencedirect.com/science/article/pii/S0019057812001991

    Google Scholar 

  34. Chuang L-Y, Yang C-H, Li J-C (2011) Chaotic maps based on binary particle swarm optimization for feature selection. Appl Soft Comput 11(1):239–248

    Google Scholar 

  35. Inbarani HH, Azar AT, Jothi G (2014) Supervised hybrid feature selection based on pso and rough sets for medical diagnosis. Comput Methods Programs Biomed 113(1):175–185

    Google Scholar 

  36. Emarya E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381

    Google Scholar 

  37. Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl 27(4):1053–1073. https://doi.org/10.1007/s00521-015-1920-1

    Article  Google Scholar 

  38. Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht

    MATH  Google Scholar 

  39. Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recogn Lett 28(4):459–471. 10.1016/j.patrec.2006.09.003. http://www.sciencedirect.com/science/article/pii/S0167865506002327

    Google Scholar 

  40. Polkowski L, Tsumoto S, Lin TY (2000) Rough set methods and applications: new developments in knowledge discovery in information systems, vol 56 of studies in fuzziness and soft computing. Physica-Verlag, Heidelberg

  41. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67. 10.1016/j.advengsoft.2016.01.008. http://www.sciencedirect.com/science/article/pii/S0965997816300163

    Google Scholar 

  42. Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453. 10.1016/j.asoc.2017.11.006. http://www.sciencedirect.com/science/article/pii/S1568494617306695

    Google Scholar 

  43. Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312. 10.1016/j.neucom.2017.04.053. http://www.sciencedirect.com/science/article/pii/S092523121730807X

    Google Scholar 

  44. Eid HF (2018) Binary whale optimisation: an effective swarm algorithm for feature selection. Int J Metaheuristics 7(1):67–79. https://doi.org/10.1504/IJMHEUR.2018.091880

    Article  Google Scholar 

  45. Ke L, Feng Z, Ren Z (2008) An efficient ant colony optimization approach to attribute reduction in rough set theory. Pattern Recogn Lett 29(9):1351–1357

    Google Scholar 

  46. Jensen R, Shen Q (2004) Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches. IEEE Trans Knowl Data Eng 16(12):1457–1471

    Google Scholar 

  47. Yumin C, Duoqian M, Ruizhi W (2010) A rough set approach to feature selection based on ant colony optimization. Pattern Recogn Lett 31(3):226–233. 10.1016/j.patrec.2009.10.013. http://www.sciencedirect.com/science/article/pii/S0167865509002888

  48. Le Cessie S, Van Houwelingen JC (1992) Ridge estimators in logistic regression. Appl Stat 41:191–201

    MATH  Google Scholar 

  49. Hosmer D, Lemeshow S, Sturdivant R (2013) Applied logistic regression, Wiley Series in Probability and Statistics, Wiley. https://books.google.ca/books?id=bRoxQBIZRd4C

  50. Salzberg SL (1994) C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach Learn 16(3):235–240. https://doi.org/10.1007/BF00993309

    MathSciNet  Google Scholar 

  51. Jantan H, Hamdan AR, Othman ZA (2010) Human talent prediction in hrm using c4.5 classification algorithm. Int J Comput Sci Eng 2(8):2526–2534

  52. Lewis DD (1998) Naive (bayes) at forty: the independence assumption in information retrieval. In: Nédellec C, Rouveirol C (eds) Machine learning: ECML-98. Springer, Berlin, Heidelberg, pp 4–15

    Google Scholar 

  53. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York. https://doi.org/10.1007/978-0-387-84858-7

    MATH  Google Scholar 

  54. Mirjalili SM, Yang X-S (2014) Binary bat algorithm. Neural Comput Appl 25(3):663–681. https://doi.org/10.1007/s00521-013-1525-5

    Article  Google Scholar 

  55. Mirjalili S, Wang GG, Coelho LS (2014) Binary optimization using hybrid particle swarm optimization and gravitational search algorithm. Neural Comput Appl 25(6):1423–1435

    Google Scholar 

  56. Kaveh A, Ghazaan MI (2017) Enhanced whale optimization algorithm for sizing optimization of skeletal structures. Mech Based Design Struct Mach 45(3):345–362. https://doi.org/10.1080/15397734.2016.1213639

    Article  Google Scholar 

  57. Mirjalili S, Lewis A (2013) S-shaped versus v-shaped transfer functions for binary particle swarm optimization. Swarm Evol Comput 9:1–14. 10.1016/j.swevo.2012.09.002. http://www.sciencedirect.com/science/article/pii/S2210650212000648

    Google Scholar 

  58. Inbarani H, Bagyamathi M, Azar A (2015) A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput Appl 26(8):1859–1880. https://doi.org/10.1007/s00521-015-1840-0

    Article  Google Scholar 

  59. Swiniarski R, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recogn Lett 24(6):833–849. 10.1016/S0167-8655(02)00196-4. http://www.sciencedirect.com/science/article/pii/S0167865502001964

    MATH  Google Scholar 

  60. Nakamura RYM, Pereira LAM, Costa KA, Rodrigues D, Papa JP, Yang XS (2012) Bba: a binary bat algorithm for feature selection. In: 2012 25th SIBGRAPI conference on graphics, patterns and images, pp 291–297. https://doi.org/10.1109/SIBGRAPI.2012.47

  61. Ming H (2008) A rough set based hybrid method to feature selection. Int Symp Knowl Acquis Model 2008:585–588. https://doi.org/10.1109/KAM.2008.12

    Article  Google Scholar 

  62. Bae C, Yeh W-C, Chung YY, Liu S-L (2010) Feature selection with intelligent dynamic swarm and rough set. Expert Syst Appl 37(10):7026–7032

    Google Scholar 

  63. Pawlak Z (1997) Rough set approach to knowledge-based decision support. Eur J Oper Res 99(1):48–57. 10.1016/S0377-2217(96)00382-7. http://www.sciencedirect.com/science/article/pii/S0377221796003827

    MATH  Google Scholar 

  64. Manish S (2002) Rough-fuzzy functions in classification. Fuzzy Sets Syst 132:353–369

    MathSciNet  MATH  Google Scholar 

  65. Chen Y, Miao D, Wang R, Wu K (2011) A rough set approach to feature selection based on power set tree. Knowl Based Syst 24(2):275–281. 10.1016/j.knosys.2010.09.004. http://www.sciencedirect.com/science/article/pii/S0950705110001498

    Google Scholar 

  66. Kohavi R, Sommerfield D (1995) Feature subset selection using the wrapper method: overfitting and dynamic search space topology. In: KDD, pp 192–197

  67. Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml/index.php

  68. Chen Y, Miao D, Wang R (2010) A rough set approach to feature selection based on ant colony optimization. Pattern Recogn Lett 31(3):226–233

    Google Scholar 

  69. Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18. 10.1016/j.swevo.2011.02.002. http://www.sciencedirect.com/science/article/pii/S2210650211000034

    Google Scholar 

  70. Alcala-Fdez J et al (2011) Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2-3):255–287. http://www.keel.es/

  71. Yao Y, Zhao Y (2008) Attribute reduction in decision-theoretic rough set models. Inf Sci 178(17):3356–3373

    MathSciNet  MATH  Google Scholar 

  72. Cervante L, Xue B, Shang L, Zhang M (2013) Binary particle swarm optimisation and rough set theory for dimension reduction in classification. IEEE Congr Evol Comput 2013:2428–2435. https://doi.org/10.1109/CEC.2013.6557860

    Article  Google Scholar 

  73. Li W, Yang Y (2002) How many genes are needed for a discriminant microarray data analysis. Methods of microarray data analysis. Springer, New York, pp 137–149

    Google Scholar 

  74. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Google Scholar 

  75. Hwang K-B, Cho D-Y, Park S-W, Kim S-D, Zhang B-T (2002) Applying machine learning techniques to analysis of gene expression data: cancer diagnosis. Methods of microarray data analysis. Springer, New York, pp 167–182

    Google Scholar 

  76. Yu L, Liu H (2003) Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863

  77. Hall MA (1999) Correlation-based feature selection for machine learning. University of Waikato, Hamilton

    Google Scholar 

  78. Wang Y, Makedon F (2004) Application of relief-f feature filtering algorithm to selecting informative genes for cancer classification using microarray data. In: Computational systems bioinformatics conference. CSB 2004. Proceedings. 2004 IEEE. IEEE, pp 497–498

  79. Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Nat Acad Sci 99(10):6567–6572

    Google Scholar 

  80. Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(02):185–205

    Google Scholar 

  81. Chen M (2016) Pattern recognition and machine learning toolbox. http://www.mathworks.com/matlabcentral/fileexchange/55826-pattern-recognition-and-machine-learning-toolbox

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their valuable suggestions and comments to enhance and improve the quality of the paper. The research of the 1st author is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC). The postdoctoral fellowship of the 2nd author is supported by NSERC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed A. Tawhid.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1. Logistic regression (LR) [48, 49, 49]  A common method that is used for classification and known as the exponential or log-linear classifiers. LR is a selective learning classifier that directly estimates the parameters of the posterior distribution function P(c|x). This algorithm assumes the distribution P(c|x) is given by Eq. (22),

$$\begin{aligned} P(c=k|x) = \frac{\exp \left( w_k^Tx\right) }{\sum \nolimits _{j=1}^{K}\exp \left( w_j^Tx\right) } \end{aligned}$$
(22)

where \(w_js\) are the parameters to estimate and K is the number of classes. Then maximum likelihood method is used to approximate \(w_js\). Since the Hessian matrix for the logistic regression model is positive definite, the error function has a unique minimum. In this proposed system, the LR is used as a classification to ensure the goodness of the selected features. The best feature combination is the one with maximum classification performance and minimum number of selected features. Note that we use the package of Pattern Recognition and Machine Learning Toolbox (PRML) in Matlab which provides logistic regression functions for both binary and multiclass classification problems [81].

2. C4.5 decision tree classifier [50, 51]  The C4.5 technique is one of the decision tree families that can produce both decision tree and rule-sets, and construct a tree to improve prediction accuracy.

C4.5 uses two heuristic criteria to rank possible tests  Information gain that uses attribute selection measure, which minimizes the total entropy of the subset \({S_i}\), and the default gain ratio that divides information gain by the information provided by the test outcomes. The information gain algorithm is described as the function gain (A), which is shown below:

  • Select the attribute with the highest information gain.

  • S contains \(s_i\) tuples of class \(C_i\) for \(i = {1,\ldots , m}\).

  • Information measure or expected information is required to classify any arbitrary tuple:

    $$\begin{aligned} I(S_1, S_2,\ldots ,S_m)=-\sum _{i=1}^{m}\frac{S_i}{S}\log _2\frac{S_i}{S}. \end{aligned}$$
    (23)
  • Entropy of attribute A with values \({a_1,a_2,\ldots ,a_v}\):

    $$\begin{aligned} E(A)=\sum _{j=1}^{v}\frac{S_{1j}+\cdots +S_{mj}}{S} I(S_{1j},\ldots ,S_{mj}). \end{aligned}$$
    (24)
  • Information gain means how much can be gained by branching on attribute A:

    $$\begin{aligned} Gain(A)=I(S_1, S_2,\ldots ,S_m)-E(A). \end{aligned}$$
    (25)

3. Naïve Bayes (NB) [52, 53]  Naïve Bayes has proven to be a simple, useful, and powerful machine learning approach in classification studies. NB is recognized as a simple Bayesian classification algorithm. NB classifier is highly scalable, requiring a number of parameters linear in the number of variables (features/predictors) in a learning problem. Maximum-likelihood training can be employed y calculating a closed-form expression, which takes linear time, rather than by expensive iterative approximation as used for many other types of classifiers.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tawhid, M.A., Ibrahim, A.M. Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm. Int. J. Mach. Learn. & Cyber. 11, 573–602 (2020). https://doi.org/10.1007/s13042-019-00996-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-019-00996-5

Keywords

Navigation