Abstract
The machine learning process in high-dimensional datasets is far more complicated than in low-dimensional datasets. In high-dimensional datasets, Feature Selection (FS) is necessary to decrease the complexity of learning. However, FS in high-dimensional datasets is a complex process that requires the combination of several search techniques. The Chimp Optimization Algorithm, known as ChOA, is a new meta-heuristic method inspired by the chimps’ individual intellect and sexual incentive in cooperative hunting. It is basically employed in solving complex continuous optimization problems, while its binary version is frequently utilized in solving difficult binary optimization problems. Both versions of ChOA are subject to premature convergence and are incapable of effectively solving high-dimensional optimization problems. This paper proposes the Binary Improved ChOA Algorithm (BICHOA) for solving the bi-objective, high-dimensional FS problems (i.e., high-dimensional FS problems that aim to maximize the classifier’s accuracy and minimize the number of selected features from a dataset). BICHOA improves the performance of ChOA using four new exploration and exploitation techniques. First, it employs the opposition-based learning approach to initially create a population of diverse binary feasible solutions. Second, it incorporates the Lévy mutation function in the main probabilistic update function of ChOA to boost its searching and exploring capabilities. Third, it uses an iterative exploration technique based on an exploratory local search method called the \(\beta\)-hill climbing algorithm. Finally, it employs a new binary time-varying transfer function to calculate binary feasible solutions from the continuous feasible solutions generated by the update equations of the ChOA and \(\beta\)-hill climbing algorithms. BICHOA’s performance was assessed and compared against six machine learning classifiers, five integer programming methods, and nine efficient popular optimization algorithms using 25 real-world high-dimensional datasets from various domains. According to the overall experimental findings, BICHOA scored the highest accuracy, best objective value, and fewest selected features for each of the 25 real-world high-dimensional datasets. Besides, the reliability of the experimental findings was established using Friedman and Wilcoxon statistical tests.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data is available at https://archive.ics.uci.edu/datasets.
References
Li K, Chen C-Y, Zhang Z-L (2023) Mining online reviews for ranking products: a novel method based on multiple classifiers and interval-valued intuitionistic fuzzy TOPSIS. Appl Soft Comput 139:110237
Abed-Alguni BH, Alawad NA, Al-Betar MA, Paul D (2023) Opposition-based sine cosine optimizer utilizing refraction learning and variable neighborhood search for feature selection. Appl Intell 53(11):13224–13260
Karizaki AA, Tavassoli M (2019) A novel hybrid feature selection based on ReliefF and binary dragonfly for high dimensional datasets. In: 2019 9th international conference on computer and knowledge engineering (ICCKE). IEEE, pp 300–304
Abasabadi S, Nematzadeh H, Motameni H, Akbari E (2022) Hybrid feature selection based on SLI and genetic algorithm for microarray datasets. J Supercomput 78(18):19725–19753
Karimi F, Dowlatshahi MB, Hashemi A (2023) SemiACO: a semi-supervised feature selection based on ant colony optimization. Expert Syst Appl 214:119130
Xue Yu, Zhu H, Neri F (2023) A feature selection approach based on NSGA-II with ReliefF. Appl Soft Comput 134:109987
Awadallah MA, Al-Betar MA, Hammouri AI, Alomari OA (2020) Binary JAYA algorithm with adaptive mutation for feature selection. Arab J Sci Eng 45(12):10875–10890
Zhang B, Yang X, Biao H, Liu Z, Li Z (2020) OEbBOA: a novel improved binary butterfly optimization approaches with various strategies for feature selection. IEEE Access 8:67799–67812
Abed-alguni BH, AL-Jarah SH (2023) IBJA: an improved binary DJaya algorithm for feature selection. J Comput Sci 75:102201
Alawad NA, Abed-alguni BH, Al-Betar MA, Jaradat A (2023) Binary improved white shark algorithm for intrusion detection systems. Neural Comput Appl 35(26):19427–19451
Barhoush M, Abed-alguni BH, Al-qudah NEA (2023) Improved discrete salp swarm algorithm using exploration and exploitation techniques for feature selection in intrusion detection systems. J Supercomput 79(18):21265–21309
Elaziz MA, Oliva D (2018) Parameter estimation of solar cells diode models by an improved opposition-based whale optimization algorithm. Energy Convers Manag 171:1843–1859
Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
Awadallah MA, Braik MS, Al-Betar MA, Doush IA (2023) An enhanced binary artificial rabbits optimization for feature selection in medical diagnosis. Neural Comput Appl 35(27):20013–20068
Braik MS, Hammouri AI, Awadallah MA, Al-Betar MA, Khtatneh K (2023) An improved hybrid chameleon swarm algorithm for feature selection in medical diagnosis. Biomed Signal Process Control 85:105073
Mohamed EA, Braik MS, Al-Betar MA, Awadallah MA (2024) Boosted spider wasp optimizer for high-dimensional feature selection. J Bionic Eng. https://doi.org/10.1007/s42235-024-00558-8
Tubishat M, Idris N, Shuib L, Abushariah MAM, Mirjalili S (2020) Improved salp swarm algorithm based on opposition based learning and novel local search algorithm for feature selection. Expert Syst Appl 145:113122
Giraud C (2021) Introduction to high-dimensional statistics. Chapman and Hall/CRC, London
Yan C, Ma J, Luo H, Patel A (2019) Hybrid binary coral reefs optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical datasets. Chemom Intell Lab Syst 184:102–111
Khishe M, Mosavi MR (2020) Chimp optimization algorithm. Expert Syst Appl 149:113338
Pashaei E, Pashaei E (2022) An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput Appl 34(8):6427–6451
Kaidi W, Khishe M, Mohammadi M (2022) Dynamic levy flight chimp optimization. Knowl Based Syst 235:107625
Al-Betar MA, Awadallah MA, Bolaji AL, Alijla BO (2017) \(\beta\)-Hill climbing algorithm for sudoku game. In: 2017 Palestinian international conference on information and communication technology (PICICT). IEEE, pp 84–88
Črepinšek M, Liu S-H, Mernik M (2013) Exploration and exploitation in evolutionary algorithms: a survey. ACM Comput Surv (CSUR) 45(3):1–33
Morales-Castañeda B, Zaldivar D, Cuevas E, Fausto F, Rodríguez A (2020) A better balance in metaheuristic algorithms: does it exist? Swarm Evol Comput 54:100671
Arora S, Singh S (2019) Butterfly optimization algorithm: a novel approach for global optimization. Soft Comput 23:715–734
Long W, Ming X, Jiao J, Tiebin W, Tang M, Cai S (2022) A velocity-based butterfly optimization algorithm for high-dimensional optimization and feature selection. Expert Syst Appl 201:117217
Zhu Y, Li W, Li T (2023) A hybrid artificial immune optimization for high-dimensional feature selection. Knowl Based Syst 260:110111
Ma W, Zhou X, Zhu H, Li L, Jiao L (2021) A two-stage hybrid ant colony optimization for high-dimensional feature selection. Pattern Recogn 116:107933
Kabir MM, Shahjahan Md, Murase K (2012) A new hybrid ant colony optimization algorithm for feature selection. Expert Syst Appl 39(3):3747–3763
Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123
Tabakhi S, Moradi P (2015) Relevance-redundancy feature selection based on ant colony optimization. Pattern Recogn 48(9):2798–2811
Hatta NM, Zain AM, Roselina Sallehuddin Z, Shayfull YY (2019) Recent studies on optimisation method of grey wolf optimiser (GWO): a review (2014–2017). Artif Intell Rev 52:2651–2683
Pan H, Chen S, Xiong H (2023) A high-dimensional feature selection method based on modified gray wolf optimization. Appl Soft Comput 135:110031
Pant M, Zaheer H, Garcia-Hernandez L, Abraham A et al (2020) Differential evolution: a review of more than two decades of research. Eng Appl Artif Intell 90:103479
Mafarja M, Thaher T, Too J, Chantar H, Turabieh H, Houssein EH, Emam MM (2023) An efficient high-dimensional feature selection approach driven by enhanced multi-strategy grey wolf optimizer for biological data classification. Neural Comput Appl 35(2):1749–1775
Mirjalili S (2016) SCA: a sine cosine algorithm for solving optimization problems. Knowl Based Syst 96:120–133
Rizk-Allah RM, Hassanien AE (2023) A comprehensive survey on the sine-cosine optimization algorithm. Artif Intell Rev 56(6):4801–4858
Abualigah L, Diabat A (2021) Advances in sine cosine algorithm: a comprehensive survey. Artif Intell Rev 54(4):2567–2608
Braik M, Hammouri A, Atwan J, Al-Betar MA, Awadallah MA (2022) White shark optimizer: a novel bio-inspired meta-heuristic algorithm for global optimization problems. Knowl Based Syst 243:108457
Mafarja M, Heidari AA, Habib M, Faris H, Thaher T, Aljarah I (2020) Augmented whale feature selection for IoT attacks: structure, analysis and applications. Future Gener Comput Syst 112:18–40
Dahou A, Elaziz MA, Chelloug SA, Awadallah MA, Al-Betar MA, Al-Qaness MAA, Forestiero A (2022) Intrusion detection system for IoT based on deep learning and modified reptile search algorithm. Comput Intell Neurosci. https://doi.org/10.1155/2022/6473507
Abualigah L, Shehab M, Alshinwan M, Alabool H (2020) Salp swarm algorithm: a comprehensive survey. Neural Comput Appl 32:11195–11215
Khanesar MA, Teshnehlab M, Shoorehdeli MA (2007) A novel binary particle swarm optimization. In: 2007 Mediterranean conference on control & automation. IEEE, pp 1–6
Wang G-G, Deb S, Cui Z (2019) Monarch butterfly optimization. Neural Comput Appl 31:1995–2014
Sun L, Si S, Zhao J, Jiucheng X, Lin Y, Lv Z (2023) Feature selection using binary monarch butterfly optimization. Appl Intell 53(1):706–727
Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: algorithm and applications. Future Gener Comput Syst 97:849–872
Thaher T, Heidari AA, Mafarja M, Dong JS, Mirjalili S (2020) Binary Harris hawks optimizer for high-dimensional, low sample size feature selection. In: Evolutionary machine learning techniques: algorithms and applications. pp 251–272
Abed-Alguni BH, Paul D, Hammad R (2022) Improved salp swarm algorithm for solving single-objective continuous optimization problems. Appl Intell 52(15):17217–17236
Hussain K, Neggaz N, Zhu W, Houssein EH (2021) An efficient hybrid sine-cosine Harris hawks optimization for low and high-dimensional feature selection. Expert Syst Appl 176:114778
Balakrishnan K, Dhanalakshmi R, Akila M, Sinha BB (2023) Improved equilibrium optimization based on levy flight approach for feature selection. Evol Syst 14(4):735–746
Sayed S, Nassef M, Badr A, Farag I (2019) A nested genetic algorithm for feature selection in high-dimensional cancer microarray datasets. Expert Syst Appl 121:233–243
Awadallah MA, Hammouri AI, Al-Betar MA, Braik MS, Elaziz MA (2022) Binary horse herd optimization algorithm with crossover operators for feature selection. Comput Biol Med 141:105152
Awadallah MA, Al-Betar MA, Braik MS, Hammouri AI, Doush IA, Zitar RA (2022) An enhanced binary rat swarm optimizer based on local-best concepts of PSO and collaborative crossover operators for feature selection. Comput Biol Med 147:105675
Dhiman G, Garg M, Nagar A, Kumar V, Dehghani M (2021) A novel algorithm for global optimization: rat swarm optimizer. J Ambient Intell Humaniz Comput 12:8457–8482
Seyyedabbasi A, Kiani F (2021) I-GWO and Ex-GWO: improved algorithms of the grey wolf optimizer to solve global optimization problems. Eng Comput 37(1):509–532
Nadimi-Shahraki MH, Moeini E, Taghian S, Mirjalili S (2023) Discrete improved grey wolf optimizer for community detection. J Bionic Eng 20(5):2331–2358
Nadimi-Shahraki MH, Taghian S, Mirjalili S, Faris H (2020) MTDE: an effective multi-trial vector-based differential evolution algorithm and its applications for engineering design problems. Appl Soft Comput 97:106761
Nadimi-Shahraki MH, Taghian S, Zamani H, Mirjalili S, Elaziz MA (2023) MMKE: multi-trial vector-based monkey king evolution algorithm and its applications for engineering optimization problems. PLoS ONE 18(1):e0280006
Bernardino HS, Barbosa HJC (2009) Artificial immune systems for optimization. In: Nature-inspired algorithms for optimisation. Springer, pp 389–411
Abdel-Basset M, Mohamed R, Jameel M, Abouhawwash M (2023) Spider wasp optimizer: a novel meta-heuristic optimization algorithm. Artif Intell Rev 56(10):11675–11738
Brownlee J (2020) Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery
Alhenawi E, Alazzam H, Al-Sayyed R, AbuAlghanam O, Adwan O (2022) Hybrid feature selection method for intrusion detection systems based on an improved intelligent water drop algorithm. Cybern Inf Technol 22(4):73–90
Dong G, Liu H (2018) Feature engineering for machine learning and data analytics. CRC Press, Boca Raton
Al-Betar MA, Aljarah I, Awadallah MA, Faris H, Mirjalili S (2019) Adaptive \(\beta\)-hill climbing for optimization. Soft Comput 23(24):13489–13512
Abed-alguni BH, Alawad NA, Barhoush M, Hammad R (2021) Exploratory cuckoo search for solving single-objective optimization problems. Soft Comput 25(15):10167–10180
Shehab M, Khader AT, Al-Betar MA (2017) A survey on applications and variants of the cuckoo search algorithm. Appl Soft Comput 61:1041–1059
Gandomi AH, Yang X-S, Alavi AH (2013) Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems. Eng Comput 29:17–35
Tizhoosh HR (2005) Opposition-based learning: a new scheme for machine intelligence. In: International conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06), vol 1. IEEE, pp 695–701
Li J, Gao L, Li X (2024) Multi-operator opposition-based learning with the neighborhood structure for numerical optimization problems and its applications. Swarm Evol Comput 84:101457
Islam MJ, Li X, Mei Y (2017) A time-varying transfer function for balancing the exploration and exploitation ability of a binary PSO. Appl Soft Comput 59:182–196
Al-Betar MA, Hammouri AI, Awadallah MA, Doush IA (2021) Binary \(\beta\)-hill climbing optimizer with s-shape transfer function for feature selection. J Ambient Intell Humaniz Comput 12(7):7637–7665
Singh D, Singh B (2022) Feature wise normalization: an effective way of normalizing data. Pattern Recogn 122:108307
Fukunaga K (2013) Introduction to statistical pattern recognition. Elsevier, Amsterdam
Izonin I, Tkachenko R, Shakhovska N, Ilchyshyn B, Singh KK (2022) A two-step data normalization approach for improving classification accuracy in the medical diagnosis domain. Mathematics 10(11):1942
Blagus R, Lusa L (2013) Smote for high-dimensional class-imbalanced data. BMC Bioinform 14:1–16
Allouche O, Tsoar A, Kadmon R (2006) Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). J Appl Ecol 43(6):1223–1232
Mirjalili S, Zhang H, Mirjalili S, Chalup S, Noman N (2020) A novel u-shaped transfer function for binary particle swarm optimisation. In: Soft computing for problem solving 2019: proceedings of SocProS 2019, vol 1. Springer, pp 241–259
Guo S, Wang J, Guo M (2020) Z-shaped transfer functions for binary particle swarm optimization algorithm. Comput Intell Neurosci 2020(1):6502807
Faris H, Heidari AA, Ala’M A-Z, Mafarja M, Aljarah I, Eshtay M, Mirjalili S (2020) Time-varying hierarchical chains of salps with random weight networks for feature selection. Expert Syst Appl 140:112898
Too J, Abdullah AR, Saad NM (2019) A new quadratic binary Harris hawk optimization for feature selection. Electronics 8(10):1130
Abdel-Basset M, El-Shahat D, El-Henawy I, De Victor Hugo C, Albuquerque SM (2020) A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Syst Appl 139:112824
Shambour MKY, Abusnaina AA, Alsalibi AI (2019) Modified global flower pollination algorithm and its application for optimization problems. Interdiscip Sci Comput Life Sci 11:496–507
Li Z (2023) A local opposition-learning golden-sine grey wolf optimization algorithm for feature selection in data classification. Appl Soft Comput 142:110319
Sindhu R, Ngadiran R, Yacob YM, Zahri NAH, Hariharan M (2017) Sine-cosine algorithm for feature selection with elitism strategy and new updating mechanism. Neural Comput Appl 28:2947–2958
Too J, Mirjalili S (2021) General learning equilibrium optimizer: a new feature selection method for biological data classification. Appl Artif Intell 35(3):247–263
Gibbons JD, Gibbons Fielden JD (1993) Nonparametric statistics: an introduction, number 90. Sage
Zhou H, Wang X, Zhu R (2022) Feature selection based on mutual information with correlation coefficient. Appl Intell 52(5):5457–5474
Agrawal RK, Kaur B, Sharma S (2020) Quantum based whale optimization algorithm for wrapper feature selection. Appl Soft Comput 89:106092
Meenachi L, Ramakrishnan S (2020) Differential evolution and ACO based global optimal feature selection with fuzzy rough set for cancer data classification. Soft Comput 24(24):18463–18475
Piri J, Mohapatra P, Pradhan MR, Acharya B, Patra TK (2021) A binary multi-objective chimp optimizer with dual archive for feature selection in the healthcare domain. IEEE Access 10:1756–1774
Zhang X, Fan M, Wang D, Zhou P, Tao D (2020) Top-k feature selection framework using robust 0–1 integer programming. IEEE Trans Neural Netw Learn Syst 32(7):3005–3019
Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. pp 333–342
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Advances in neural information processing systems, vol 18
Shi Y, Miao J, Wang Z, Zhang P, Niu L (2018) Feature selection with \(\backslash\)ell_ \(2, 1--2\) regularization. IEEE Trans Neural Netw Learn Syst 29(10):4967–4982
Zhang Q, Sun J, Tsang E, Ford J (2004) Hybrid estimation of distribution algorithm for global optimization. Eng Comput 21(1):91–107
Nadimi-Shahraki MH, Taghian S, Mirjalili S, Abualigah L (2022) Binary aquila optimizer for selecting effective features from medical data: a covid-19 case study. Mathematics 10(11):1929
Sharafaldin I, Lashkari AH, Hakak S, Ghorbani AA (2019) Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy. In: 2019 international Carnahan conference on security technology (ICCST). IEEE, pp 1–8
Nuiaa RR, Manickam S, Alsaeedi AH, Alomari ES (2022) A new proactive feature selection model based on the enhanced optimization algorithms to detect DRDoS attacks. Int J Electr Comput Eng 12(2):1869–1880
Abed-alguni BH, Barhoush M (2018) Distributed grey wolf optimizer for numerical optimization problems. Jordan J Comput Inf Technol 4(03):21
Author information
Authors and Affiliations
Contributions
Nour Elhuda A. Al-qudah was contributed to conceptualization, methodology, investigation, validation, writing—original draft preparation, supervision. Bilal H. Abed-alguni was contributed to experimentation, visualization, writing—original draft preparation, reviewing and editing. Malek Barhoush was contributed to experimentation, visualization, reviewing and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: comparison between ML classifiers and BICHOA based on true skill statics
True Skill Statics (TSS) is one of the most efficient quantitative measures of performance. It can differentiate between two types of errors (omission and commission errors). The values of these errors range from − 1 to 1, where a value in [− 1,0] suggests that the performance is random and unreliable, and a value of 1 suggests perfect agreement (reliable performance). In Table 28, the TSS measure was used to compare between BICHOA and four ML approaches (SVM, KNN, DT, and LR). From the table, we can clearly make several observations. All TSS values are > 0, and most are 1’s or close to 1’s. This indicates that the performance of these algorithms is not random. Second, BICHOA scored the highest TSS values for 17 out of 25 datasets compared to the other ML classifiers. In addition, most of the TSS values of BICHOA are greater than 0.5, and most scores are 1’s or near 1’s.
Appendix B: comparison between five ML classifiers embedded in BICHOA
In this experiment, four classifiers (SVM, KNN, DT, and LR) were embedded individually into BICHOA to produce four variations of BICHOA (BICHOA-SVM, BICHOA-KNN, BICHOA-DT, and BICHOA-LR). The purpose here is to test which ML classifier shows the best performance with BICHOA. The results in Table 29 are a summary of the overall results (Accuracy, Precision, Recall, F1 score) of four variations of BICHOA over 25 high-dimensional datasets. Overall, BICHOA-SVM was the best variation of BICHOA, where it scored the best results for 22 datasets out of 25 possible datasets. This simply suggests that SVM is the best choice of classifier for BICHOA. Hence, BICHOA-SVM was used for comparison purposes with the baselines in Sect. 5.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Al-qudah, N.E.A., Abed-alguni, B.H. & Barhoush, M. Bi-objective feature selection in high-dimensional datasets using improved binary chimp optimization algorithm. Int. J. Mach. Learn. & Cyber. 15, 6107–6148 (2024). https://doi.org/10.1007/s13042-024-02308-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-024-02308-y