Skip to main content
Log in

Feature selection using symmetric uncertainty and hybrid optimization for high-dimensional data

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Recently, when handling high-dimensional data, it has become extremely difficult to search this optimal subset of selected features due to the restriction of reducing the exponential increase of the search procedure, and most of those feature selection models neglect the interactions of features or feature and decision class. This paper develops a novel feature selection approach using symmetric uncertainty and hybrid optimization for high-dimensional data (FSUHO) for high-dimensional data. First, to fully reflect the interaction relationship of features or feature and decision class, the F-relevance between features and the C-correlation between feature and decision class based on the symmetric uncertainty are constructed to remove those redundant features. Then, a strong correlation threshold is improved based on the C-correlation and random coefficient to prevent the removal of the effective features in this first stage. Second, to decrease this expensive computational consumption, one criterion for judging a weakly correlated feature is designed to sort all features, and another criterion is developed to select the class center. The similarity between features and class centers is calculated, and similar features are clustered into one class. Then, the symmetric uncertainty correlation-based feature clustering model can be constructed in this second stage. In the third stage, a hybrid optimization approach of particle swarm optimizer (PSO) and wild horse optimizer (WHO) for feature selection is proposed, where the association-guided group initialization probability with a multiobjective optimized particle selection scheme is defined as a criterion for the PSO in selecting stallion particles for the WHO, and the improved WHO is developed by integrating the nonlinear inertial weight factor and the Brownian motion operator to obtain the optimal subset of selected features. Finally, a novel three-stage feature selection algorithm is developed. Experimental results apply to 16 datasets prove the efficiency of FSUHO in tackling high-dimensional feature selection problems in metrics of classification accuracy and running time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets in Table 1 can be downloaded from http://featureselection.asu.edu/datasets.php, http://portals.broadinstitute.org/cgi-bin/cancer/datasets.cgi, and http://archive.ics.uci.edu/ml/.

References

  1. Xu WH, Yuan KH, Li WT, Ding WP (2023) An emerging fuzzy feature selection method using composite entropy-based uncertainty measure and data distribution. IEEE Transact Emerg Top Computat Intellig 7(1):76–88

    Article  Google Scholar 

  2. Sun L, Wang TX, Ding WP, Xu JC (2022) Partial multilabel learning using fuzzy neighbourhood-based ball clustering and kernel extreme learning machine. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2022.3222941

    Article  Google Scholar 

  3. Xue B, Zhang MJ, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626

    Article  Google Scholar 

  4. Li WT, Zhou HX, Xu WH, Wang XZ, Pedrycz W (2022) Interval dominance-based feature selection for interval-valued ordered data. IEEE Transact Neural Net Learn Syst. https://doi.org/10.1109/TNNLS.2022.3184120

    Article  Google Scholar 

  5. Chen K, Xue B, Zhang MJ, Zhou FY (2022) An evolutionary multitasking-based feature selection method for high-dimensional classification. IEEE Transact Cybernet 52(7):7172–7186

    Article  Google Scholar 

  6. Sun L, Li MM, Ding WP, Zhang E, Mu XX, Xu JC (2022) AFNFS: Adaptive fuzzy neighborhood-based feature selection with adaptive synthetic over-sampling for imbalanced data. Inf Sci 612:724–744

    Article  Google Scholar 

  7. Sun L, Wang LY, Ding WP, Qian YH, Xu JC (2021) Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans Fuzzy Syst 29(1):19–33

    Article  Google Scholar 

  8. Zhu XF, Zhang SC, Zhu YH, Zhu PF, Gao Y (2022) Unsupervised spectral feature selection with dynamic hyper-graph learning. IEEE Trans Knowl Data Eng 34(6):3016–3028

    Google Scholar 

  9. Zhu YB, Li WS, Li T (2023) A hybrid artificial immune optimization for high-dimensional feature selection. Knowl-Based Syst 260(25):110111

    Article  Google Scholar 

  10. Xu WH, Guo DD, Qian YH, Ding WP (2022) Two-way concept-cognitive learning method: a fuzzy-based progressive learning. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2022.3216110

    Article  Google Scholar 

  11. Xu WH, Guo DD, Mi JS, Qian YH, Zheng KY, Ding WP (2023) Two-way concept-cognitive learning via concept movement viewpoint. IEEE Transact Neu Net Learn Syst. https://doi.org/10.1109/TNNLS.2023.3235800

    Article  Google Scholar 

  12. Kang Y, Wang HN, Pu B, Tao L, Chen JG, Yu PS (2022) A hybrid two-stage teaching-learning-based optimization algorithm for feature selection in bioinformatics. IEEE/ACM Trans Comput Biol Bioinf. https://doi.org/10.1109/TCBB.2022.3215129

    Article  Google Scholar 

  13. Sun L, Zhang JX, Ding WP, Xu JC (2022) Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted k-nearest neighbors. Inf Sci 593:591–613

    Article  Google Scholar 

  14. Halim Z (2021) An ensemble filter-based heuristic approach for cancerous gene expression classification. Knowl-Based Syst 234:107560

    Article  Google Scholar 

  15. Zhang L, Chen XB (2021) Feature selection methods based on symmetric uncertainty coefficients and independent classification information. IEEE access 9:13845–13856

    Article  Google Scholar 

  16. Bakhshandeh S, Azmi R, Teshnehlab M (2020) Symmetric uncertainty class-feature association map for feature selection in microarray dataset. Int J Mach Learn Cybern 11(1):15–32

    Article  Google Scholar 

  17. Chai ZY, Li WW, Li YL (2023) Symmetric uncertainty based decomposition multi-objective immune algorithm for feature selection. Swarm Evol Comput 78:101286

    Article  Google Scholar 

  18. Lee S, Lee GS (2023) Automatic features extraction integrated with exact Gaussian process for respiratory rate and uncertainty estimations. IEEE access 11:2754–2766

    Article  Google Scholar 

  19. Zhu XY, Wang Y, Li YB, Tan YH, Wang GT, Song QB (2019) A new unsupervised feature selection algorithm using similarity-based feature clustering. Comput Intell 35(1):2–22

    Article  MathSciNet  Google Scholar 

  20. Zhong WC, Chen XJ, Wu QY, Yang M, Huang JZ (2021) Selection of diverse features with a diverse regularization. Pattern Recogn 120:108154

    Article  Google Scholar 

  21. Yan XY, Nazmi S, Erol BA, Homaifar A, Gebru B, Tunstel E (2020) An efficient unsupervised feature selection procedure through feature clustering. Pattern Recogn Lett 131:277–284

    Article  Google Scholar 

  22. Dehghan Z, Mansoori EG (2018) A new feature subset selection using bottom-up clustering. Pattern Anal Appl 21(1):57–66

    Article  MathSciNet  Google Scholar 

  23. Liu Q, Zhang JX, Xiao JK, Zhu HM, Zhao QP, A supervised feature selection algorithm through minimum spanning tree clustering. In: IEEE 26th international conference on tools with artificial intelligence, (2014) doi: https://doi.org/10.1109/ICTAI.2014.47.

  24. Kennedy J, Eberhart R, Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks (1995) doi: https://doi.org/10.1109/ICNN.1995.488968.

  25. Song XF, Zhang Y, Gong DW, Liu H, Zhang WQ (2022) Surrogate sample-assisted particle swarm optimization for feature selection on high-dimensional data. IEEE Trans Evol Comput. https://doi.org/10.1109/TEVC.2022.3175226

    Article  Google Scholar 

  26. Dhal P, Azad C (2021) A multi-objective feature selection method using Newton’s law based PSO with GWO. Appl Soft Comput 107:107394

    Article  Google Scholar 

  27. Al-Tashi Q, Kadir SJA, Rais HM, Mirjalili S, Alhussian H (2019) Binary optimization using hybrid grey wolf optimization for feature selection. IEEE access 7:39496–39508

    Article  Google Scholar 

  28. Bansal SR, Wadhawan S, Goel R (2022) mRMR-PSO: A hybrid feature selection technique with a multiobjective approach for sign language recognition. Arab J Sci Eng 47(8):10365–10380

    Article  Google Scholar 

  29. El-Shafiey MG, Hagag A, El-Dahshan ESA, Ismail MA (2022) A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest. Multimedia Tools Applicat 81(13):18155–18179

    Article  Google Scholar 

  30. Sun L, Si SS, Ding WP, Wang XY, Xu JC (2023) TFSFB, Two-stage feature selection via fusing fuzzy multi-neighborhood rough set with binary whale optimization for imbalanced data. Informat Fus 95:91–108

    Article  Google Scholar 

  31. Sun L, Wang TX, Ding WP, Xu JC, Tan AH (2022) Two-stage-neighborhood-based multilabel classification for incomplete data with missing labels. Int J Intell Syst 37:6773–6810

    Article  Google Scholar 

  32. Song XF, Zhang Y, Guo YN, Sun XY, Wang YL (2020) Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data. IEEE Trans Evol Comput 24(5):882–895

    Article  Google Scholar 

  33. Dokeroglu T, Deniz A, Kiziloz HE (2022) A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 494:269–296

    Article  Google Scholar 

  34. Sun L, Wang XY, Ding WP, Xu JC (2022) TSFNFR: Two-stage fuzzy neighborhood-based feature reduction with binary whale optimization algorithm for imbalanced data classification. Knowl-Based Syst 256:109849

    Article  Google Scholar 

  35. Ashokkumar P, Shankar GS, Srivastava G, Maddikunta PKR, Gadekallu TR (2021) A two-stage text feature selection algorithm for improving text classification. ACM Transact Asian Low-Res Lang Informat Process 20(3):49

    Google Scholar 

  36. Ma WP, Zhou XB, Zhu H, Li LW, Jiao LC (2021) A two-stage hybrid ant colony optimization for high-dimensional feature selection. Pattern Recogn 116:107933

    Article  Google Scholar 

  37. Huang ZK, Yang CH, Zhou XJ, Huang TW (2019) A hybrid feature selection method based on binary state transition algorithm and ReliefF. IEEE J Biomed Health Inform 23(5):1888–1898

    Article  Google Scholar 

  38. Shen Y, Cai WZ, Kang HW, Sun XP, Chen QY, Zhang HG (2021) A particle swarm algorithm based on a multi-stage search strategy. Entropy 23(9):1200

    Article  MathSciNet  Google Scholar 

  39. Xu WH, Pan YZ, Chen XW, Ding WP, Qian YH (2022) A novel dynamic fusion approach using information entropy for interval-valued ordered datasets. IEEE Transact Big Data. https://doi.org/10.1109/TBDATA.2022.3215494

    Article  Google Scholar 

  40. Sun L, Yin TY, Ding WP, Qian YH, Xu JC (2022) Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy. IEEE Trans Fuzzy Syst 30(5):1197–1211

    Article  Google Scholar 

  41. Song XF, Zhang Y, Gong DW, Sun XY (2021) Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recogn 112:107804

    Article  Google Scholar 

  42. Rahmanian M, Mansoori E (2022) Unsupervised fuzzy multivariate symmetric uncertainty feature selection based on constructing virtual cluster representative. Fuzzy Sets Syst 438:148–163

    Article  MathSciNet  Google Scholar 

  43. Song XF, Zhang Y, Gong DW, Gao XZ (2022) A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data. IEEE Transact Cybernet 52(9):9573–9586

    Article  Google Scholar 

  44. Naruei I, Keynia F (2022) Wild horse optimizer: a new meta-heuristic algorithm for solving engineering optimization problems. Eng Comput 38(4):3025–3056

    Article  Google Scholar 

  45. Sun L, Chen SS, Xu JC, Tian Y (2019) Improved monarch butterfly optimization algorithm based on opposition-based learning and random local perturbation. Complexity 2019:4182148

    Article  Google Scholar 

  46. Li YC, Yuan QY, Han MX, Cui R (2022) Hybrid multi-strategy improved wild horse optimizer. Adv Intell Syst 4(10):2200097

    Article  Google Scholar 

  47. Ewees AA, Ismail FH, Ghoniem RM (2022) Wild horse optimizer-based spiral updating for feature selection. IEEE Access 10:106258–106274

    Article  Google Scholar 

  48. Sun L, Wang TX, Ding WP, Xu JC, Lin YJ (2021) Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification. Inf Sci 578:887–912

    Article  MathSciNet  Google Scholar 

  49. Sun L, Yin TY, Ding WP, Qian YH, Xu JC (2020) Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci 537:401–424

    Article  MathSciNet  MATH  Google Scholar 

  50. Li JD, Cheng KW, Wang SH, Morstatter F, Trevino RP, Tang J, Liu H (2018) Feature selection: a data perspective. ACM Comput Surv 50(6):1–45

    Article  Google Scholar 

  51. Sun L, Wang LY, Ding WP, Qian YH, Xu JC (2020) Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl-Based Syst 192:105373

    Article  Google Scholar 

  52. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MathSciNet  MATH  Google Scholar 

  53. Zhao X, Deng W, Shi Y (2013) Feature selection with attributes clustering by maximal information coefficient. Procedia Comp Sci 17:70–79

    Article  Google Scholar 

  54. Xu WH, Yuan KH, Li WT (2022) Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Appl Intell 52(8):9148–9173

    Article  Google Scholar 

  55. Mao QH, Zhang Q (2021) Improved sparrow algorithm combining cauchy mutation and opposition-based learning. J Front Comp Sci Technol 15(6):1155–1164

    Google Scholar 

  56. Balakrishnan K, Dhanalakshmi R, Khaire UM (2022) A novel control factor and Brownian motion-based improved Harris Hawks Optimization for feature selection. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-021-03621-y

    Article  Google Scholar 

  57. Zhang Y, Song XF, Gong DW (2017) A return-cost-based binary firefly algorithm for feature selection. Inf Sci 418–419:561–574

    Article  Google Scholar 

  58. Chen K, Zhou FY, Yuan XF (2019) Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection. Expert Syst Appl 128:140–156

    Article  Google Scholar 

  59. Xue Y, Xue B, Zhang MJ (2019) Self-adaptive particle swarm optimization for large-scale feature selection in classification. ACM Trans Knowl Discov Data 13(5):50

    Article  Google Scholar 

  60. Chuang LY, Yang CS, Wu KC, Yang CH (2011) Gene selection and classification using Taguchi chaotic binary particle swarm optimization. Expert Syst Appl 38(10):13367–13377

    Article  Google Scholar 

  61. Ansari G, Ahmad T, Doja MN (2019) Hybrid filter-wrapper feature selection method for sentiment classification. Arab J Sci Eng 44(11):9191–9208

    Article  Google Scholar 

  62. Zhang Y, Gong DW, Hu Y, Zhang WQ (2015) Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 148:150–157

    Article  Google Scholar 

  63. Wu Q, Ma ZP, Fan J, Xu G, Shen YF (2019) A feature selection method based on hybrid improved binary quantum particle swarm optimization. IEEE access 7:80588–80601

    Article  Google Scholar 

  64. Yang YY, Chen DG, Zhang X, Ji ZY, Zhang YJ (2022) Incremental feature selection by sample selection and feature-based accelerator. Appl Soft Comput 121:108800

    Article  Google Scholar 

  65. Xue JK, Shen B (2021) A novel swarm intelligence optimization approach: sparrow search algorithm. Syst Sci Cont Eng 8(1):22–34

    Article  Google Scholar 

  66. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92

    Article  MathSciNet  MATH  Google Scholar 

  67. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

Download references

Funding

The authors would like to express their sincere appreciation to the anonymous reviewers for their insightful comments, which greatly improved the quality of this paper. This research was funded by the National Natural Science Foundation of China under Grants 62076089, 61772176, 61976082, and 61976120; and the Natural Science Key Foundation of Jiangsu Education Department under Grant 21KJA510004.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lin Sun or Weiping Ding.

Ethics declarations

Conflict of interest

The work described is not under consideration for publication elsewhere; all the necessary files have been uploaded by online; each author has participated sufficiently; and all the authors listed have approved the manuscript that is enclosed.

The authors state that this research complies with ethical standards. This research does not involve either human participants or animals.

Ethical approval

The authors state that this research complies with ethical standards. This research does not involve either human participants or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Sun, S., Ding, W. et al. Feature selection using symmetric uncertainty and hybrid optimization for high-dimensional data. Int. J. Mach. Learn. & Cyber. 14, 4339–4360 (2023). https://doi.org/10.1007/s13042-023-01897-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01897-4

Keywords

Navigation