Feature selection in high-dimensional data: an enhanced RIME optimization with information entropy pruning and DBSCAN clustering

Wu, Huangying; Chen, Yi; Zhu, Wei; Cai, Zhennao; Heidari, Ali Asghar; Chen, Huiling

doi:10.1007/s13042-024-02143-1

Feature selection in high-dimensional data: an enhanced RIME optimization with information entropy pruning and DBSCAN clustering

Original Article
Published: 24 April 2024

Volume 15, pages 4211–4254, (2024)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Huangying Wu¹,
Yi Chen¹,
Wei Zhu²,
Zhennao Cai¹,
Ali Asghar Heidari³ &
…
Huiling Chen¹

717 Accesses
4 Citations
Explore all metrics

Abstract

When confronted with high-dimensional data, evolutionary feature selection methods encounter the formidable challenge known as the “curse of dimensionality”. To overcome this challenge, our study delves into developing an optimized algorithm, enhanced RIME (ERIME), which ingeniously integrates feature information entropy pruning and the DBSCAN spatial clustering particle evolution method to tackle the intricacies of feature selection. Initially, a dimensionality reduction strategy is introduced based on feature information entropy pruning, effectively limiting the computational burden associated with high-dimensional data. This strategy enhances the algorithm’s search speed and substantially boosts its overall efficiency. Hence, a particle deletion strategy is proposed to safeguard the quality of particles within the population, selectively eliminating particles through multidimensional clustering while monitoring individual fitness differences. Additionally, we incorporate a particle generation strategy rooted in the Markov chain Monte Carlo method, strategically sampling and generating new particles from the distribution of superior particles. The efficacy of the ERIME is rigorously evaluated on 26 benchmark datasets, including 14 high-dimensional datasets and 12 low-dimensional datasets. A comprehensive comparison is conducted with seven cutting-edge intelligent algorithms: IMFO, PSO_GWO, MCGSA, BBPS, QBBA, SSA, and WOA. The experimental results demonstrate the algorithm’s ability to obtain exceptional feature subsets, notably showcasing its competitive edge in resolving high-dimensional feature selection challenges. Regarding average fitness value, ERIME attains a remarkable 79.54% improvement in convergence accuracy compared to RIME. Regarding the average classification error rate, ERIME achieves an impressive 98.54% reduction compared to RIME. Furthermore, in effectively selecting features, ERIME outperforms RIME by reducing the count by 85%. In conclusion, the enhanced RIME algorithm proposed in this study effectively solves the complex problems of high-dimensional feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A High-Dimensional Feature Selection Method via Selection and Non-selection Operators and Local Search Mechanism in Particle Swarm Optimization

MPF-FS: A multi-population framework based on multi-objective optimization algorithms for feature selection

Article 24 June 2023

Improving Evolutionary Algorithm Performance for Feature Selection in High-Dimensional Data

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

No datasets were generated or analysed during the current study.

References

Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532
Google Scholar
Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381
Google Scholar
Ambusaidi MA et al (2016) Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput 65(10):2986–2998
MathSciNet Google Scholar
Ang JC et al (2015) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinf 13(5):971–989
Google Scholar
Persello C, Bruzzone L (2015) Kernel-based domain-invariant feature selection in hyperspectral images for transfer learning. IEEE Trans Geosci Remote Sens 54(5):2615–2626
Google Scholar
Remeseiro B, Bolon-Canedo V (2019) A review of feature selection methods in medical applications. Comput Biol Med 112:103375
Google Scholar
Urbanowicz RJ et al (2018) Relief-based feature selection: introduction and review. J Biomed Inform 85:189–203
Google Scholar
Xue B et al (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Google Scholar
Wang S et al (2018) Convolutional neural network-based hidden Markov models for rolling element bearing fault identification. Knowl Based Syst 144:65–76
Google Scholar
Wang S, Xiang J (2020) A minimum entropy deconvolution-enhanced convolutional neural networks for fault diagnosis of axial piston pumps. Soft Comput 24(4):2983–2997
Google Scholar
Yan W-J, Chen Y-H (2018) Measuring dynamic micro-expressions via feature extraction methods. J Comput Sci 25:318–326
Google Scholar
Zhang J et al (2021) ROSEFusion: random optimization for online dense reconstruction under fast camera motion. ACM Trans Graph (TOG) 40(4):1–17
MathSciNet Google Scholar
Cao B et al (2019) Multiobjective 3-D topology optimization of next-generation wireless data center network. IEEE Trans Ind Inform 16(5):3597–3605
Google Scholar
Cao J et al (2023) Reconstruction of full-field dynamic responses for large-scale structures using optimal sensor placement. J Sound Vib 554:117693
Google Scholar
Cao B et al (2019) Security-aware industrial wireless sensor network deployment optimization. IEEE Trans Ind Inform 16(8):5309–5316
Google Scholar
Wu Q et al (2023) Monte Carlo simulation-based robust workflow scheduling for spot instances in cloud environments. Tsinghua Sci Technol 29(1):112–126
Google Scholar
Lyu T et al (2023) Source selection and resource allocation in wireless-powered relay networks: an adaptive dynamic programming-based approach. IEEE Int Things J 11(5):8973–8988
Google Scholar
Cao B et al (2020) Diversified personalized recommendation optimization based on mobile data. IEEE Trans Intell Transp Syst 22(4):2133–2139
Google Scholar
Xie Y et al (2023) A two-stage estimation of distribution algorithm with heuristics for energy-aware cloud workflow scheduling. IEEE Trans Serv Comput 16(6):4183–4197
Google Scholar
Xu X, Wang C, Zhou P (2021) GVRP considered oil-gas recovery in refined oil distribution: from an environmental perspective. Int J Prod Econ 235:108078
Google Scholar
Mou J et al (2023) A machine learning approach for energy-efficient intelligent transportation scheduling problem in a real-world dynamic circumstances. IEEE Trans Intell Transp Syst 24(12):15527–15539
Google Scholar
Xu X et al (2022) Multi-objective robust optimisation model for MDVRPLS in refined oil distribution. Int J Prod Res 60(22):6772–6792
Google Scholar
Xiao Z et al (2023) Multi-objective parallel task offloading and content caching in D2D-aided MEC networks. IEEE Trans Mob Comput 22(11):6599–6615
Google Scholar
Li S et al (2023) Hybrid method with parallel-factor theory, a support vector machine, and particle filter optimization for intelligent machinery failure identification. Machines 11(8):837
Google Scholar
Cao B et al (2020) RFID reader anticollision based on distributed parallel particle swarm optimization. IEEE Internet Things J 8(5):3099–3107
Google Scholar
Zhou X et al (2022) Parameter adaptation-based ant colony optimization with dynamic hybrid mechanism. Eng Appl Artif Intell 114:105139
Google Scholar
Yildiz AR et al (2019) A new hybrid Harris hawks-Nelder-Mead optimization algorithm for solving design and manufacturing problems. Mater Test 61(8):735–743
Google Scholar
Yang Y et al (2021) Hunger games search: visions, conception, implementation, deep analysis, perspectives, and towards performance shifts. Expert Syst Appl 177:114864
Google Scholar
Houssein EH et al (2023) Liver Cancer Algorithm: a novel bio-inspired optimizer. Comput Biol Med 165:107389
Google Scholar
Zhu B et al (2023) A critical scenario search method for intelligent vehicle testing based on the social cognitive optimization algorithm. IEEE Trans Intell Transp Sys 24(8):7974–7986
Google Scholar
Chen H et al (2022) Slime mould algorithm: a comprehensive review of recent variants and applications. Int J Syst Sci 54(1):204–235
Google Scholar
Li S et al (2020) Slime mould algorithm: a new method for stochastic optimization. Future Gener Comput Syst 111:300–323
Google Scholar
Heidari AA et al (2019) Harris hawks optimization: algorithm and applications. Future Gener Comput Syst Int J Esci 97:849–872
Google Scholar
Tu J et al (2021) The colony predation algorithm. J Bionic Eng 18(3):674–710
Google Scholar
Ahmadianfar I et al (2021) RUN beyond the metaphor: an efficient optimization algorithm based on Runge Kutta method. Expert Syst Appl 181:115079
Google Scholar
Ahmadianfar I et al (2022) INFO: an efficient optimization algorithm based on weighted mean of vectors. Expert Syst Appl 116516
Hsu H-P, Wang C-N (2021) A hybrid approach combining improved shuffled frog-leaping algorithm with dynamic programming for disassembly process planning. IEEE Access 9:57743–57756
Google Scholar
Huang Y, Shen X-N, You X (2021) A discrete shuffled frog-leaping algorithm based on heuristic information for traveling salesman problem. Appl Soft Comput 102:107085
Google Scholar
Jadidoleslam M, Ebrahimi A (2015) Reliability constrained generation expansion planning by a modified shuffled frog leaping algorithm. Int J Electr Power Energy Syst 64:743–751
Google Scholar
Chen Y, Zhou A (2022) Multiobjective portfolio optimization via Pareto front evolution. Complex Intell Syst 8(5):4301–4317
Google Scholar
Zhang C, Zhou L, Li Y (2023) Pareto optimal reconfiguration planning and distributed parallel motion control of mobile modular robots. IEEE Trans Ind Electron 1–10
Got A et al (2023) Improved manta ray foraging optimizer-based SVM for feature selection problems: a medical case study. J Bionic Eng 21(1):409–425
Google Scholar
Chen Y et al (2022) Multi-threshold image segmentation using a multi-strategy shuffled frog leaping algorithm. Expert Syst Appl 194:116511
Google Scholar
Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Google Scholar
Ibrahim RA et al (2019) Improved salp swarm algorithm based on particle swarm optimization for feature selection. J Ambient Intell Humaniz Comput 10(8):3155–3169
Google Scholar
Song X-F et al (2021) Feature selection using bare-bones particle swarm optimization with mutual information. Patt Recognit 112:107804
Google Scholar
Song X-F et al (2021) A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data. IEEE Trans Cybern 52(9):9573–9586
Google Scholar
Li A-D, Xue B, Zhang M (2021) Improved binary particle swarm optimization for feature selection with new initialization and search space reduction strategies. Appl Soft Comput 106:107302
Google Scholar
Uthayakumar J et al (2020) Financial crisis prediction model using ant colony optimization. Int J Inf Manag 50:538–556
Google Scholar
Paniri M, Dowlatshahi MB, Nezamabadi-pour H (2021) Ant-TD: ant colony optimization plus temporal difference reinforcement learning for multi-label feature selection. Swarm Evol Comput 64:100892
Google Scholar
Singh U, Singh SN (2019) A new optimal feature selection scheme for classification of power quality disturbances based on ant colony framework. Appl Soft Comput 74:216–225
Google Scholar
Zhang Y et al (2019) Spectral features extraction for estimation of soil total nitrogen content based on modified ant colony optimization algorithm. Geoderma 333:23–34
Google Scholar
Tabakhi S, Moradi P (2015) Relevance–redundancy feature selection based on ant colony optimization. Pattern Recognit 48(9):2798–2811
Google Scholar
Paniri M, Dowlatshahi MB, Nezamabadi-Pour H (2020) MLACO: a multi-label feature selection algorithm based on ant colony optimization. Knowl Based Syst 192:105285
Google Scholar
Abdel-Basset M, Ding W, El-Shahat D (2021) A hybrid Harris Hawks optimization algorithm with simulated annealing for feature selection. Artif Intell Rev 54:593–637
Google Scholar
Too J, Liang G, Chen H (2022) Memory-based Harris hawk optimization with learning agents: a feature selection approach. Eng Comput 38(Suppl 5):4457–4478
Google Scholar
Zhang Y et al (2021) Boosted binary Harris hawks optimizer and feature selection. Eng Comput 37:3741–3770
Google Scholar
Hussain K et al (2021) An efficient hybrid sine-cosine Harris hawks optimization for low and high-dimensional feature selection. Expert Syst Appl 176:114778
Google Scholar
Long W et al (2022) Lens-imaging learning Harris hawks optimizer for global optimization and its application to feature selection. Expert Syst Appl 202:117255
Google Scholar
Zhang Y et al (2020) Binary differential evolution with self-learning for multi-objective feature selection. Inf Sci 507:67–85
MathSciNet Google Scholar
Zorarpacı E, Özel SA (2016) A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst Appl 62:91–103
Google Scholar
Wan Y et al (2016) A feature selection method based on modified binary coded ant colony optimization algorithm. Appl Soft Comput 49:248–258
Google Scholar
Das AK, Das S, Ghosh A (2017) Ensemble feature selection using bi-objective genetic algorithm. Knowl Based Syst 123:116–127
Google Scholar
Hu Y et al (2023) A federated feature selection algorithm based on particle swarm optimization under privacy protection. Knowl Based Syst 260:110122
Google Scholar
Li A-D, Xue B, Zhang M (2023) Multi-objective particle swarm optimization for key quality feature selection in complex manufacturing processes. Inf Sci 641:119062
Google Scholar
Dahou A et al (2023) A social media event detection framework based on transformers and swarm optimization for public notification of crises and emergency management. Technol Forecast Soc Change 192:122546
Google Scholar
Li L et al (2023) An evolutionary multitasking algorithm with multiple filtering for high-dimensional feature selection. IEEE Trans Evol Comput 27:802–816
Google Scholar
Qu L et al (2023) Explicit and size-adaptive PSO-based feature selection for classification. Swarm Evol Comput 77:101249
Google Scholar
Aher CN, Jena AK (2023) Improved invasive weed bird swarm optimization algorithm (IWBSOA) enabled hybrid deep learning classifier for diabetic prediction. J Ambient Intell Humaniz Comput 14(4):3929–3945
Google Scholar
Ahadzadeh B et al (2023) SFE: a simple, fast and efficient feature selection algorithm for high-dimensional data. IEEE Trans Evol Comput 27(6):1896–1911
Google Scholar
Mafarja M et al (2023) An efficient high-dimensional feature selection approach driven by enhanced multi-strategy grey wolf optimizer for biological data classification. Neural Comput Appl 35(2):1749–1775
Google Scholar
Wan Y et al (2023) Adaptive multi-strategy particle swarm optimization for hyperspectral remote sensing image band selection. IEEE Trans Geosci Remote Sens 611–15
Google Scholar
Zhou K et al (2023) Data preprocessing strategy in constructing convolutional neural network classifier based on constrained particle swarm optimization with fuzzy penalty function. Eng Appl Artif Intell 117:105580
Google Scholar
Sun L et al (2023) TFSFB: two-stage feature selection via fusing fuzzy multi-neighborhood rough set with binary whale optimization for imbalanced data. Inf Fusion 95:91–108
Google Scholar
Liu X et al (2023) Adapting feature selection algorithms for the classification of Chinese texts. Systems 11(9):483
Google Scholar
Li J et al (2017) Feature selection: a data perspective. ACM Comput Surv (CSUR) 50(6):1–45
Google Scholar
Su H et al (2023) RIME: a physics-based optimization. Neurocomputing 532:183–214
Google Scholar
Yu X et al (2023) Synergizing the enhanced RIME with fuzzy K-nearest neighbor for diagnose of pulmonary hypertension. Comput Biol Med 165:107408
Google Scholar
Cui T-J, Liu S, Li L-L (2016) Information entropy of coding metasurface. Light: Sci Appl 5(11):e16172
Google Scholar
Hou J, Gao H, Li X (2016) DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Trans Image Process 25(7):3182–3193
MathSciNet Google Scholar
Shen J et al (2016) Real-time superpixel segmentation by DBSCAN clustering algorithm. IEEE Trans Image Process 25(12):5933–5942
MathSciNet Google Scholar
Sharma S (2017) Markov chain Monte Carlo methods for Bayesian data analysis in astronomy. Annu Rev Astron Astrophys 55:213–259
Google Scholar
Bouchard-Côté A, Vollmer SJ, Doucet A (2018) The bouncy particle sampler: a nonreversible rejection-free Markov chain Monte Carlo method. J Am Stat Assoc 113(522):855–867
MathSciNet Google Scholar
Cunningham P, Delany SJ (2021) k-Nearest neighbour classifiers—a tutorial. ACM Comput Surv (CSUR) 54(6):1–25
Google Scholar
Wang C et al (2017) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29(7):2986–2999
MathSciNet Google Scholar
Lin Y et al (2016) Multi-label feature selection based on neighborhood mutual information. Appl Soft Comput 38:244–256
Google Scholar
Lee J, Kim D-W (2015) Mutual information-based multi-label feature selection using interaction information. Expert Syst Appl 42(4):2013–2025
Google Scholar
Tang J, Liu G, Pan Q (2021) A review on representative swarm intelligence algorithms for solving optimization problems: applications and trends. IEEE/CAA J Autom Sin 8(10):1627–1643
MathSciNet Google Scholar
Chakraborty A, Kar AK (2017) Swarm intelligence: a review of algorithms. In: Nature-inspired computing and optimization: theory and applications 10:475–494
Slowik A, Kwasnicka H (2017) Nature inspired methods and their industry applications—swarm intelligence algorithms. IEEE Trans Ind Inform 14(3):1004–1015
Google Scholar
Galán SF (2019) Comparative evaluation of region query strategies for DBSCAN clustering. Inf Sci 502:76–90
MathSciNet Google Scholar
Schönborn S et al (2017) Markov chain Monte Carlo for automated face image analysis. Int J Comput Vis 123:160–183
MathSciNet Google Scholar
Minaee S et al (2021) Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell 44(7):3523–3542
Google Scholar
Yousif A et al (2019) A survey on sentiment analysis of scientific citations. Artif Intell Rev 52:1805–1838
Google Scholar
Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Google Scholar
Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Google Scholar
Chen K-H et al (2014) Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinform 15(1):1–10
Google Scholar
Cui Y et al (2013) Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data. Comput Biol Med 43(7):933–941
Google Scholar
Khan J et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679
Google Scholar
Pelusi D et al (2020) An Improved Moth-Flame Optimization algorithm with hybrid search phase. Knowl Based Syst 191:105277
Google Scholar
Teng Z-J, Lv J-L, Guo L-W (2019) An improved hybrid grey wolf optimization algorithm. Soft Comput 23:6617–6631
Google Scholar
Song Z et al (2017) Multiple chaos embedded gravitational search algorithm. IEICE Trans Inf Syst 100(4):888–900
Google Scholar
Liu Z et al (2021) A hybrid genetic-particle swarm algorithm based on multilevel neighbourhood structure for flexible job shop scheduling problem. Comput Oper Res 135:105431
MathSciNet Google Scholar
Sharma P, Sharma K (2022) A novel quantum-inspired binary bat algorithm for leukocytes classification in blood smear. Expert Syst 39(3):e12813
Google Scholar
Liu Y et al (2022) Simulated annealing-based dynamic step shuffled frog leaping algorithm: optimal performance design and feature selection. Neurocomputing 503:325–362
Google Scholar
Peng L et al (2023) Hierarchical Harris hawks optimizer for feature selection. J Adv Res 53:261–278
Google Scholar
Leon MA, Kumar S, Bhattacharya S (2002) A comprehensive procedure for performance evaluation of solar food dryers. Renew Sustain Energy Rev 6(4):367–393
Google Scholar
Uihlein A, Magagna D (2016) Wave and tidal current energy—a review of the current state of research beyond technology. Renew Sustain Energy Rev 58:1070–1081
Google Scholar

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of Zhejiang Province (LTGS23E070001, LZ22F020005), National Natural Science Foundation of China (62076185, 62301367, 52204263).

Author information

Authors and Affiliations

Institute of Big data and Information Technology, Wenzhou University, Wenzhou, 325035, China
Huangying Wu, Yi Chen, Zhennao Cai & Huiling Chen
School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
Wei Zhu
School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran, Iran
Ali Asghar Heidari

Authors

Huangying Wu
View author publications
You can also search for this author inPubMed Google Scholar
Yi Chen
View author publications
You can also search for this author inPubMed Google Scholar
Wei Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Zhennao Cai
View author publications
You can also search for this author inPubMed Google Scholar
Ali Asghar Heidari
View author publications
You can also search for this author inPubMed Google Scholar
Huiling Chen
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Huangying Wu: contributions: writing—original draft, writing—review and editing, software, visualization, investigation. Yi Chen: contributions: conceptualization, methodology, formal analysis, investigation, writing—review and editing, funding acquisition, supervision. Wei Zhu: contributions: writing—original draft, writing—review and editing, software, visualization, investigation. Zhennao Cai: contributions: writing—original draft, writing—review and editing, software, visualization, investigation. Ali Asghar Heidari: contributions: writing—original draft, writing—review and editing, software, visualization, investigation. Huiling Chen: contributions: conceptualization, methodology, formal analysis, investigation, writing—review and editing, funding acquisition, supervision.

Corresponding authors

Correspondence to Yi Chen or Huiling Chen.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Tables 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 and 24.

Table 6 Comparison of the developed ERIME with other binary meta-heuristic algorithms in terms of average fitness values for high-dimensional datasets

Feature selection in high-dimensional data: an enhanced RIME optimization with information entropy pruning and DBSCAN clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A High-Dimensional Feature Selection Method via Selection and Non-selection Operators and Local Search Mechanism in Particle Swarm Optimization

MPF-FS: A multi-population framework based on multi-objective optimization algorithms for feature selection

Improving Evolutionary Algorithm Performance for Feature Selection in High-Dimensional Data

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now