Abstract
Recently, when handling high-dimensional data, it has become extremely difficult to search this optimal subset of selected features due to the restriction of reducing the exponential increase of the search procedure, and most of those feature selection models neglect the interactions of features or feature and decision class. This paper develops a novel feature selection approach using symmetric uncertainty and hybrid optimization for high-dimensional data (FSUHO) for high-dimensional data. First, to fully reflect the interaction relationship of features or feature and decision class, the F-relevance between features and the C-correlation between feature and decision class based on the symmetric uncertainty are constructed to remove those redundant features. Then, a strong correlation threshold is improved based on the C-correlation and random coefficient to prevent the removal of the effective features in this first stage. Second, to decrease this expensive computational consumption, one criterion for judging a weakly correlated feature is designed to sort all features, and another criterion is developed to select the class center. The similarity between features and class centers is calculated, and similar features are clustered into one class. Then, the symmetric uncertainty correlation-based feature clustering model can be constructed in this second stage. In the third stage, a hybrid optimization approach of particle swarm optimizer (PSO) and wild horse optimizer (WHO) for feature selection is proposed, where the association-guided group initialization probability with a multiobjective optimized particle selection scheme is defined as a criterion for the PSO in selecting stallion particles for the WHO, and the improved WHO is developed by integrating the nonlinear inertial weight factor and the Brownian motion operator to obtain the optimal subset of selected features. Finally, a novel three-stage feature selection algorithm is developed. Experimental results apply to 16 datasets prove the efficiency of FSUHO in tackling high-dimensional feature selection problems in metrics of classification accuracy and running time.
Similar content being viewed by others
Data availability
The datasets in Table 1 can be downloaded from http://featureselection.asu.edu/datasets.php, http://portals.broadinstitute.org/cgi-bin/cancer/datasets.cgi, and http://archive.ics.uci.edu/ml/.
References
Xu WH, Yuan KH, Li WT, Ding WP (2023) An emerging fuzzy feature selection method using composite entropy-based uncertainty measure and data distribution. IEEE Transact Emerg Top Computat Intellig 7(1):76–88
Sun L, Wang TX, Ding WP, Xu JC (2022) Partial multilabel learning using fuzzy neighbourhood-based ball clustering and kernel extreme learning machine. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2022.3222941
Xue B, Zhang MJ, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Li WT, Zhou HX, Xu WH, Wang XZ, Pedrycz W (2022) Interval dominance-based feature selection for interval-valued ordered data. IEEE Transact Neural Net Learn Syst. https://doi.org/10.1109/TNNLS.2022.3184120
Chen K, Xue B, Zhang MJ, Zhou FY (2022) An evolutionary multitasking-based feature selection method for high-dimensional classification. IEEE Transact Cybernet 52(7):7172–7186
Sun L, Li MM, Ding WP, Zhang E, Mu XX, Xu JC (2022) AFNFS: Adaptive fuzzy neighborhood-based feature selection with adaptive synthetic over-sampling for imbalanced data. Inf Sci 612:724–744
Sun L, Wang LY, Ding WP, Qian YH, Xu JC (2021) Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans Fuzzy Syst 29(1):19–33
Zhu XF, Zhang SC, Zhu YH, Zhu PF, Gao Y (2022) Unsupervised spectral feature selection with dynamic hyper-graph learning. IEEE Trans Knowl Data Eng 34(6):3016–3028
Zhu YB, Li WS, Li T (2023) A hybrid artificial immune optimization for high-dimensional feature selection. Knowl-Based Syst 260(25):110111
Xu WH, Guo DD, Qian YH, Ding WP (2022) Two-way concept-cognitive learning method: a fuzzy-based progressive learning. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2022.3216110
Xu WH, Guo DD, Mi JS, Qian YH, Zheng KY, Ding WP (2023) Two-way concept-cognitive learning via concept movement viewpoint. IEEE Transact Neu Net Learn Syst. https://doi.org/10.1109/TNNLS.2023.3235800
Kang Y, Wang HN, Pu B, Tao L, Chen JG, Yu PS (2022) A hybrid two-stage teaching-learning-based optimization algorithm for feature selection in bioinformatics. IEEE/ACM Trans Comput Biol Bioinf. https://doi.org/10.1109/TCBB.2022.3215129
Sun L, Zhang JX, Ding WP, Xu JC (2022) Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted k-nearest neighbors. Inf Sci 593:591–613
Halim Z (2021) An ensemble filter-based heuristic approach for cancerous gene expression classification. Knowl-Based Syst 234:107560
Zhang L, Chen XB (2021) Feature selection methods based on symmetric uncertainty coefficients and independent classification information. IEEE access 9:13845–13856
Bakhshandeh S, Azmi R, Teshnehlab M (2020) Symmetric uncertainty class-feature association map for feature selection in microarray dataset. Int J Mach Learn Cybern 11(1):15–32
Chai ZY, Li WW, Li YL (2023) Symmetric uncertainty based decomposition multi-objective immune algorithm for feature selection. Swarm Evol Comput 78:101286
Lee S, Lee GS (2023) Automatic features extraction integrated with exact Gaussian process for respiratory rate and uncertainty estimations. IEEE access 11:2754–2766
Zhu XY, Wang Y, Li YB, Tan YH, Wang GT, Song QB (2019) A new unsupervised feature selection algorithm using similarity-based feature clustering. Comput Intell 35(1):2–22
Zhong WC, Chen XJ, Wu QY, Yang M, Huang JZ (2021) Selection of diverse features with a diverse regularization. Pattern Recogn 120:108154
Yan XY, Nazmi S, Erol BA, Homaifar A, Gebru B, Tunstel E (2020) An efficient unsupervised feature selection procedure through feature clustering. Pattern Recogn Lett 131:277–284
Dehghan Z, Mansoori EG (2018) A new feature subset selection using bottom-up clustering. Pattern Anal Appl 21(1):57–66
Liu Q, Zhang JX, Xiao JK, Zhu HM, Zhao QP, A supervised feature selection algorithm through minimum spanning tree clustering. In: IEEE 26th international conference on tools with artificial intelligence, (2014) doi: https://doi.org/10.1109/ICTAI.2014.47.
Kennedy J, Eberhart R, Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks (1995) doi: https://doi.org/10.1109/ICNN.1995.488968.
Song XF, Zhang Y, Gong DW, Liu H, Zhang WQ (2022) Surrogate sample-assisted particle swarm optimization for feature selection on high-dimensional data. IEEE Trans Evol Comput. https://doi.org/10.1109/TEVC.2022.3175226
Dhal P, Azad C (2021) A multi-objective feature selection method using Newton’s law based PSO with GWO. Appl Soft Comput 107:107394
Al-Tashi Q, Kadir SJA, Rais HM, Mirjalili S, Alhussian H (2019) Binary optimization using hybrid grey wolf optimization for feature selection. IEEE access 7:39496–39508
Bansal SR, Wadhawan S, Goel R (2022) mRMR-PSO: A hybrid feature selection technique with a multiobjective approach for sign language recognition. Arab J Sci Eng 47(8):10365–10380
El-Shafiey MG, Hagag A, El-Dahshan ESA, Ismail MA (2022) A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest. Multimedia Tools Applicat 81(13):18155–18179
Sun L, Si SS, Ding WP, Wang XY, Xu JC (2023) TFSFB, Two-stage feature selection via fusing fuzzy multi-neighborhood rough set with binary whale optimization for imbalanced data. Informat Fus 95:91–108
Sun L, Wang TX, Ding WP, Xu JC, Tan AH (2022) Two-stage-neighborhood-based multilabel classification for incomplete data with missing labels. Int J Intell Syst 37:6773–6810
Song XF, Zhang Y, Guo YN, Sun XY, Wang YL (2020) Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data. IEEE Trans Evol Comput 24(5):882–895
Dokeroglu T, Deniz A, Kiziloz HE (2022) A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 494:269–296
Sun L, Wang XY, Ding WP, Xu JC (2022) TSFNFR: Two-stage fuzzy neighborhood-based feature reduction with binary whale optimization algorithm for imbalanced data classification. Knowl-Based Syst 256:109849
Ashokkumar P, Shankar GS, Srivastava G, Maddikunta PKR, Gadekallu TR (2021) A two-stage text feature selection algorithm for improving text classification. ACM Transact Asian Low-Res Lang Informat Process 20(3):49
Ma WP, Zhou XB, Zhu H, Li LW, Jiao LC (2021) A two-stage hybrid ant colony optimization for high-dimensional feature selection. Pattern Recogn 116:107933
Huang ZK, Yang CH, Zhou XJ, Huang TW (2019) A hybrid feature selection method based on binary state transition algorithm and ReliefF. IEEE J Biomed Health Inform 23(5):1888–1898
Shen Y, Cai WZ, Kang HW, Sun XP, Chen QY, Zhang HG (2021) A particle swarm algorithm based on a multi-stage search strategy. Entropy 23(9):1200
Xu WH, Pan YZ, Chen XW, Ding WP, Qian YH (2022) A novel dynamic fusion approach using information entropy for interval-valued ordered datasets. IEEE Transact Big Data. https://doi.org/10.1109/TBDATA.2022.3215494
Sun L, Yin TY, Ding WP, Qian YH, Xu JC (2022) Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy. IEEE Trans Fuzzy Syst 30(5):1197–1211
Song XF, Zhang Y, Gong DW, Sun XY (2021) Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recogn 112:107804
Rahmanian M, Mansoori E (2022) Unsupervised fuzzy multivariate symmetric uncertainty feature selection based on constructing virtual cluster representative. Fuzzy Sets Syst 438:148–163
Song XF, Zhang Y, Gong DW, Gao XZ (2022) A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data. IEEE Transact Cybernet 52(9):9573–9586
Naruei I, Keynia F (2022) Wild horse optimizer: a new meta-heuristic algorithm for solving engineering optimization problems. Eng Comput 38(4):3025–3056
Sun L, Chen SS, Xu JC, Tian Y (2019) Improved monarch butterfly optimization algorithm based on opposition-based learning and random local perturbation. Complexity 2019:4182148
Li YC, Yuan QY, Han MX, Cui R (2022) Hybrid multi-strategy improved wild horse optimizer. Adv Intell Syst 4(10):2200097
Ewees AA, Ismail FH, Ghoniem RM (2022) Wild horse optimizer-based spiral updating for feature selection. IEEE Access 10:106258–106274
Sun L, Wang TX, Ding WP, Xu JC, Lin YJ (2021) Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification. Inf Sci 578:887–912
Sun L, Yin TY, Ding WP, Qian YH, Xu JC (2020) Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci 537:401–424
Li JD, Cheng KW, Wang SH, Morstatter F, Trevino RP, Tang J, Liu H (2018) Feature selection: a data perspective. ACM Comput Surv 50(6):1–45
Sun L, Wang LY, Ding WP, Qian YH, Xu JC (2020) Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl-Based Syst 192:105373
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Zhao X, Deng W, Shi Y (2013) Feature selection with attributes clustering by maximal information coefficient. Procedia Comp Sci 17:70–79
Xu WH, Yuan KH, Li WT (2022) Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Appl Intell 52(8):9148–9173
Mao QH, Zhang Q (2021) Improved sparrow algorithm combining cauchy mutation and opposition-based learning. J Front Comp Sci Technol 15(6):1155–1164
Balakrishnan K, Dhanalakshmi R, Khaire UM (2022) A novel control factor and Brownian motion-based improved Harris Hawks Optimization for feature selection. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-021-03621-y
Zhang Y, Song XF, Gong DW (2017) A return-cost-based binary firefly algorithm for feature selection. Inf Sci 418–419:561–574
Chen K, Zhou FY, Yuan XF (2019) Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection. Expert Syst Appl 128:140–156
Xue Y, Xue B, Zhang MJ (2019) Self-adaptive particle swarm optimization for large-scale feature selection in classification. ACM Trans Knowl Discov Data 13(5):50
Chuang LY, Yang CS, Wu KC, Yang CH (2011) Gene selection and classification using Taguchi chaotic binary particle swarm optimization. Expert Syst Appl 38(10):13367–13377
Ansari G, Ahmad T, Doja MN (2019) Hybrid filter-wrapper feature selection method for sentiment classification. Arab J Sci Eng 44(11):9191–9208
Zhang Y, Gong DW, Hu Y, Zhang WQ (2015) Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 148:150–157
Wu Q, Ma ZP, Fan J, Xu G, Shen YF (2019) A feature selection method based on hybrid improved binary quantum particle swarm optimization. IEEE access 7:80588–80601
Yang YY, Chen DG, Zhang X, Ji ZY, Zhang YJ (2022) Incremental feature selection by sample selection and feature-based accelerator. Appl Soft Comput 121:108800
Xue JK, Shen B (2021) A novel swarm intelligence optimization approach: sparrow search algorithm. Syst Sci Cont Eng 8(1):22–34
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Funding
The authors would like to express their sincere appreciation to the anonymous reviewers for their insightful comments, which greatly improved the quality of this paper. This research was funded by the National Natural Science Foundation of China under Grants 62076089, 61772176, 61976082, and 61976120; and the Natural Science Key Foundation of Jiangsu Education Department under Grant 21KJA510004.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The work described is not under consideration for publication elsewhere; all the necessary files have been uploaded by online; each author has participated sufficiently; and all the authors listed have approved the manuscript that is enclosed.
The authors state that this research complies with ethical standards. This research does not involve either human participants or animals.
Ethical approval
The authors state that this research complies with ethical standards. This research does not involve either human participants or animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, L., Sun, S., Ding, W. et al. Feature selection using symmetric uncertainty and hybrid optimization for high-dimensional data. Int. J. Mach. Learn. & Cyber. 14, 4339–4360 (2023). https://doi.org/10.1007/s13042-023-01897-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01897-4