Abstract
When confronted with high-dimensional data, evolutionary feature selection methods encounter the formidable challenge known as the “curse of dimensionality”. To overcome this challenge, our study delves into developing an optimized algorithm, enhanced RIME (ERIME), which ingeniously integrates feature information entropy pruning and the DBSCAN spatial clustering particle evolution method to tackle the intricacies of feature selection. Initially, a dimensionality reduction strategy is introduced based on feature information entropy pruning, effectively limiting the computational burden associated with high-dimensional data. This strategy enhances the algorithm’s search speed and substantially boosts its overall efficiency. Hence, a particle deletion strategy is proposed to safeguard the quality of particles within the population, selectively eliminating particles through multidimensional clustering while monitoring individual fitness differences. Additionally, we incorporate a particle generation strategy rooted in the Markov chain Monte Carlo method, strategically sampling and generating new particles from the distribution of superior particles. The efficacy of the ERIME is rigorously evaluated on 26 benchmark datasets, including 14 high-dimensional datasets and 12 low-dimensional datasets. A comprehensive comparison is conducted with seven cutting-edge intelligent algorithms: IMFO, PSO_GWO, MCGSA, BBPS, QBBA, SSA, and WOA. The experimental results demonstrate the algorithm’s ability to obtain exceptional feature subsets, notably showcasing its competitive edge in resolving high-dimensional feature selection challenges. Regarding average fitness value, ERIME attains a remarkable 79.54% improvement in convergence accuracy compared to RIME. Regarding the average classification error rate, ERIME achieves an impressive 98.54% reduction compared to RIME. Furthermore, in effectively selecting features, ERIME outperforms RIME by reducing the count by 85%. In conclusion, the enhanced RIME algorithm proposed in this study effectively solves the complex problems of high-dimensional feature selection.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
No datasets were generated or analysed during the current study.
References
Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532
Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381
Ambusaidi MA et al (2016) Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput 65(10):2986–2998
Ang JC et al (2015) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinf 13(5):971–989
Persello C, Bruzzone L (2015) Kernel-based domain-invariant feature selection in hyperspectral images for transfer learning. IEEE Trans Geosci Remote Sens 54(5):2615–2626
Remeseiro B, Bolon-Canedo V (2019) A review of feature selection methods in medical applications. Comput Biol Med 112:103375
Urbanowicz RJ et al (2018) Relief-based feature selection: introduction and review. J Biomed Inform 85:189–203
Xue B et al (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Wang S et al (2018) Convolutional neural network-based hidden Markov models for rolling element bearing fault identification. Knowl Based Syst 144:65–76
Wang S, Xiang J (2020) A minimum entropy deconvolution-enhanced convolutional neural networks for fault diagnosis of axial piston pumps. Soft Comput 24(4):2983–2997
Yan W-J, Chen Y-H (2018) Measuring dynamic micro-expressions via feature extraction methods. J Comput Sci 25:318–326
Zhang J et al (2021) ROSEFusion: random optimization for online dense reconstruction under fast camera motion. ACM Trans Graph (TOG) 40(4):1–17
Cao B et al (2019) Multiobjective 3-D topology optimization of next-generation wireless data center network. IEEE Trans Ind Inform 16(5):3597–3605
Cao J et al (2023) Reconstruction of full-field dynamic responses for large-scale structures using optimal sensor placement. J Sound Vib 554:117693
Cao B et al (2019) Security-aware industrial wireless sensor network deployment optimization. IEEE Trans Ind Inform 16(8):5309–5316
Wu Q et al (2023) Monte Carlo simulation-based robust workflow scheduling for spot instances in cloud environments. Tsinghua Sci Technol 29(1):112–126
Lyu T et al (2023) Source selection and resource allocation in wireless-powered relay networks: an adaptive dynamic programming-based approach. IEEE Int Things J 11(5):8973–8988
Cao B et al (2020) Diversified personalized recommendation optimization based on mobile data. IEEE Trans Intell Transp Syst 22(4):2133–2139
Xie Y et al (2023) A two-stage estimation of distribution algorithm with heuristics for energy-aware cloud workflow scheduling. IEEE Trans Serv Comput 16(6):4183–4197
Xu X, Wang C, Zhou P (2021) GVRP considered oil-gas recovery in refined oil distribution: from an environmental perspective. Int J Prod Econ 235:108078
Mou J et al (2023) A machine learning approach for energy-efficient intelligent transportation scheduling problem in a real-world dynamic circumstances. IEEE Trans Intell Transp Syst 24(12):15527–15539
Xu X et al (2022) Multi-objective robust optimisation model for MDVRPLS in refined oil distribution. Int J Prod Res 60(22):6772–6792
Xiao Z et al (2023) Multi-objective parallel task offloading and content caching in D2D-aided MEC networks. IEEE Trans Mob Comput 22(11):6599–6615
Li S et al (2023) Hybrid method with parallel-factor theory, a support vector machine, and particle filter optimization for intelligent machinery failure identification. Machines 11(8):837
Cao B et al (2020) RFID reader anticollision based on distributed parallel particle swarm optimization. IEEE Internet Things J 8(5):3099–3107
Zhou X et al (2022) Parameter adaptation-based ant colony optimization with dynamic hybrid mechanism. Eng Appl Artif Intell 114:105139
Yildiz AR et al (2019) A new hybrid Harris hawks-Nelder-Mead optimization algorithm for solving design and manufacturing problems. Mater Test 61(8):735–743
Yang Y et al (2021) Hunger games search: visions, conception, implementation, deep analysis, perspectives, and towards performance shifts. Expert Syst Appl 177:114864
Houssein EH et al (2023) Liver Cancer Algorithm: a novel bio-inspired optimizer. Comput Biol Med 165:107389
Zhu B et al (2023) A critical scenario search method for intelligent vehicle testing based on the social cognitive optimization algorithm. IEEE Trans Intell Transp Sys 24(8):7974–7986
Chen H et al (2022) Slime mould algorithm: a comprehensive review of recent variants and applications. Int J Syst Sci 54(1):204–235
Li S et al (2020) Slime mould algorithm: a new method for stochastic optimization. Future Gener Comput Syst 111:300–323
Heidari AA et al (2019) Harris hawks optimization: algorithm and applications. Future Gener Comput Syst Int J Esci 97:849–872
Tu J et al (2021) The colony predation algorithm. J Bionic Eng 18(3):674–710
Ahmadianfar I et al (2021) RUN beyond the metaphor: an efficient optimization algorithm based on Runge Kutta method. Expert Syst Appl 181:115079
Ahmadianfar I et al (2022) INFO: an efficient optimization algorithm based on weighted mean of vectors. Expert Syst Appl 116516
Hsu H-P, Wang C-N (2021) A hybrid approach combining improved shuffled frog-leaping algorithm with dynamic programming for disassembly process planning. IEEE Access 9:57743–57756
Huang Y, Shen X-N, You X (2021) A discrete shuffled frog-leaping algorithm based on heuristic information for traveling salesman problem. Appl Soft Comput 102:107085
Jadidoleslam M, Ebrahimi A (2015) Reliability constrained generation expansion planning by a modified shuffled frog leaping algorithm. Int J Electr Power Energy Syst 64:743–751
Chen Y, Zhou A (2022) Multiobjective portfolio optimization via Pareto front evolution. Complex Intell Syst 8(5):4301–4317
Zhang C, Zhou L, Li Y (2023) Pareto optimal reconfiguration planning and distributed parallel motion control of mobile modular robots. IEEE Trans Ind Electron 1–10
Got A et al (2023) Improved manta ray foraging optimizer-based SVM for feature selection problems: a medical case study. J Bionic Eng 21(1):409–425
Chen Y et al (2022) Multi-threshold image segmentation using a multi-strategy shuffled frog leaping algorithm. Expert Syst Appl 194:116511
Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Ibrahim RA et al (2019) Improved salp swarm algorithm based on particle swarm optimization for feature selection. J Ambient Intell Humaniz Comput 10(8):3155–3169
Song X-F et al (2021) Feature selection using bare-bones particle swarm optimization with mutual information. Patt Recognit 112:107804
Song X-F et al (2021) A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data. IEEE Trans Cybern 52(9):9573–9586
Li A-D, Xue B, Zhang M (2021) Improved binary particle swarm optimization for feature selection with new initialization and search space reduction strategies. Appl Soft Comput 106:107302
Uthayakumar J et al (2020) Financial crisis prediction model using ant colony optimization. Int J Inf Manag 50:538–556
Paniri M, Dowlatshahi MB, Nezamabadi-pour H (2021) Ant-TD: ant colony optimization plus temporal difference reinforcement learning for multi-label feature selection. Swarm Evol Comput 64:100892
Singh U, Singh SN (2019) A new optimal feature selection scheme for classification of power quality disturbances based on ant colony framework. Appl Soft Comput 74:216–225
Zhang Y et al (2019) Spectral features extraction for estimation of soil total nitrogen content based on modified ant colony optimization algorithm. Geoderma 333:23–34
Tabakhi S, Moradi P (2015) Relevance–redundancy feature selection based on ant colony optimization. Pattern Recognit 48(9):2798–2811
Paniri M, Dowlatshahi MB, Nezamabadi-Pour H (2020) MLACO: a multi-label feature selection algorithm based on ant colony optimization. Knowl Based Syst 192:105285
Abdel-Basset M, Ding W, El-Shahat D (2021) A hybrid Harris Hawks optimization algorithm with simulated annealing for feature selection. Artif Intell Rev 54:593–637
Too J, Liang G, Chen H (2022) Memory-based Harris hawk optimization with learning agents: a feature selection approach. Eng Comput 38(Suppl 5):4457–4478
Zhang Y et al (2021) Boosted binary Harris hawks optimizer and feature selection. Eng Comput 37:3741–3770
Hussain K et al (2021) An efficient hybrid sine-cosine Harris hawks optimization for low and high-dimensional feature selection. Expert Syst Appl 176:114778
Long W et al (2022) Lens-imaging learning Harris hawks optimizer for global optimization and its application to feature selection. Expert Syst Appl 202:117255
Zhang Y et al (2020) Binary differential evolution with self-learning for multi-objective feature selection. Inf Sci 507:67–85
Zorarpacı E, Özel SA (2016) A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst Appl 62:91–103
Wan Y et al (2016) A feature selection method based on modified binary coded ant colony optimization algorithm. Appl Soft Comput 49:248–258
Das AK, Das S, Ghosh A (2017) Ensemble feature selection using bi-objective genetic algorithm. Knowl Based Syst 123:116–127
Hu Y et al (2023) A federated feature selection algorithm based on particle swarm optimization under privacy protection. Knowl Based Syst 260:110122
Li A-D, Xue B, Zhang M (2023) Multi-objective particle swarm optimization for key quality feature selection in complex manufacturing processes. Inf Sci 641:119062
Dahou A et al (2023) A social media event detection framework based on transformers and swarm optimization for public notification of crises and emergency management. Technol Forecast Soc Change 192:122546
Li L et al (2023) An evolutionary multitasking algorithm with multiple filtering for high-dimensional feature selection. IEEE Trans Evol Comput 27:802–816
Qu L et al (2023) Explicit and size-adaptive PSO-based feature selection for classification. Swarm Evol Comput 77:101249
Aher CN, Jena AK (2023) Improved invasive weed bird swarm optimization algorithm (IWBSOA) enabled hybrid deep learning classifier for diabetic prediction. J Ambient Intell Humaniz Comput 14(4):3929–3945
Ahadzadeh B et al (2023) SFE: a simple, fast and efficient feature selection algorithm for high-dimensional data. IEEE Trans Evol Comput 27(6):1896–1911
Mafarja M et al (2023) An efficient high-dimensional feature selection approach driven by enhanced multi-strategy grey wolf optimizer for biological data classification. Neural Comput Appl 35(2):1749–1775
Wan Y et al (2023) Adaptive multi-strategy particle swarm optimization for hyperspectral remote sensing image band selection. IEEE Trans Geosci Remote Sens 611–15
Zhou K et al (2023) Data preprocessing strategy in constructing convolutional neural network classifier based on constrained particle swarm optimization with fuzzy penalty function. Eng Appl Artif Intell 117:105580
Sun L et al (2023) TFSFB: two-stage feature selection via fusing fuzzy multi-neighborhood rough set with binary whale optimization for imbalanced data. Inf Fusion 95:91–108
Liu X et al (2023) Adapting feature selection algorithms for the classification of Chinese texts. Systems 11(9):483
Li J et al (2017) Feature selection: a data perspective. ACM Comput Surv (CSUR) 50(6):1–45
Su H et al (2023) RIME: a physics-based optimization. Neurocomputing 532:183–214
Yu X et al (2023) Synergizing the enhanced RIME with fuzzy K-nearest neighbor for diagnose of pulmonary hypertension. Comput Biol Med 165:107408
Cui T-J, Liu S, Li L-L (2016) Information entropy of coding metasurface. Light: Sci Appl 5(11):e16172
Hou J, Gao H, Li X (2016) DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Trans Image Process 25(7):3182–3193
Shen J et al (2016) Real-time superpixel segmentation by DBSCAN clustering algorithm. IEEE Trans Image Process 25(12):5933–5942
Sharma S (2017) Markov chain Monte Carlo methods for Bayesian data analysis in astronomy. Annu Rev Astron Astrophys 55:213–259
Bouchard-Côté A, Vollmer SJ, Doucet A (2018) The bouncy particle sampler: a nonreversible rejection-free Markov chain Monte Carlo method. J Am Stat Assoc 113(522):855–867
Cunningham P, Delany SJ (2021) k-Nearest neighbour classifiers—a tutorial. ACM Comput Surv (CSUR) 54(6):1–25
Wang C et al (2017) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29(7):2986–2999
Lin Y et al (2016) Multi-label feature selection based on neighborhood mutual information. Appl Soft Comput 38:244–256
Lee J, Kim D-W (2015) Mutual information-based multi-label feature selection using interaction information. Expert Syst Appl 42(4):2013–2025
Tang J, Liu G, Pan Q (2021) A review on representative swarm intelligence algorithms for solving optimization problems: applications and trends. IEEE/CAA J Autom Sin 8(10):1627–1643
Chakraborty A, Kar AK (2017) Swarm intelligence: a review of algorithms. In: Nature-inspired computing and optimization: theory and applications 10:475–494
Slowik A, Kwasnicka H (2017) Nature inspired methods and their industry applications—swarm intelligence algorithms. IEEE Trans Ind Inform 14(3):1004–1015
Galán SF (2019) Comparative evaluation of region query strategies for DBSCAN clustering. Inf Sci 502:76–90
Schönborn S et al (2017) Markov chain Monte Carlo for automated face image analysis. Int J Comput Vis 123:160–183
Minaee S et al (2021) Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell 44(7):3523–3542
Yousif A et al (2019) A survey on sentiment analysis of scientific citations. Artif Intell Rev 52:1805–1838
Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Chen K-H et al (2014) Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinform 15(1):1–10
Cui Y et al (2013) Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data. Comput Biol Med 43(7):933–941
Khan J et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679
Pelusi D et al (2020) An Improved Moth-Flame Optimization algorithm with hybrid search phase. Knowl Based Syst 191:105277
Teng Z-J, Lv J-L, Guo L-W (2019) An improved hybrid grey wolf optimization algorithm. Soft Comput 23:6617–6631
Song Z et al (2017) Multiple chaos embedded gravitational search algorithm. IEICE Trans Inf Syst 100(4):888–900
Liu Z et al (2021) A hybrid genetic-particle swarm algorithm based on multilevel neighbourhood structure for flexible job shop scheduling problem. Comput Oper Res 135:105431
Sharma P, Sharma K (2022) A novel quantum-inspired binary bat algorithm for leukocytes classification in blood smear. Expert Syst 39(3):e12813
Liu Y et al (2022) Simulated annealing-based dynamic step shuffled frog leaping algorithm: optimal performance design and feature selection. Neurocomputing 503:325–362
Peng L et al (2023) Hierarchical Harris hawks optimizer for feature selection. J Adv Res 53:261–278
Leon MA, Kumar S, Bhattacharya S (2002) A comprehensive procedure for performance evaluation of solar food dryers. Renew Sustain Energy Rev 6(4):367–393
Uihlein A, Magagna D (2016) Wave and tidal current energy—a review of the current state of research beyond technology. Renew Sustain Energy Rev 58:1070–1081
Acknowledgements
This work was supported in part by the Natural Science Foundation of Zhejiang Province (LTGS23E070001, LZ22F020005), National Natural Science Foundation of China (62076185, 62301367, 52204263).
Author information
Authors and Affiliations
Contributions
Huangying Wu: contributions: writing—original draft, writing—review and editing, software, visualization, investigation. Yi Chen: contributions: conceptualization, methodology, formal analysis, investigation, writing—review and editing, funding acquisition, supervision. Wei Zhu: contributions: writing—original draft, writing—review and editing, software, visualization, investigation. Zhennao Cai: contributions: writing—original draft, writing—review and editing, software, visualization, investigation. Ali Asghar Heidari: contributions: writing—original draft, writing—review and editing, software, visualization, investigation. Huiling Chen: contributions: conceptualization, methodology, formal analysis, investigation, writing—review and editing, funding acquisition, supervision.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
See Tables 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 and 24.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, H., Chen, Y., Zhu, W. et al. Feature selection in high-dimensional data: an enhanced RIME optimization with information entropy pruning and DBSCAN clustering. Int. J. Mach. Learn. & Cyber. 15, 4211–4254 (2024). https://doi.org/10.1007/s13042-024-02143-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-024-02143-1