Abstract
Feature selection is one of the important steps in data mining to reduce the dimensions of datasets. Due to the fact that feature selection is inherently a NP-hard problem, no deterministic algorithm has been identified to solve this problem in acceptable time. Meta-heuristic algorithms are reliable alternatives to solve such problems in acceptable time. In the literature, a large number of algorithms have been proposed to solve the feature selection problem using meta-heuristic optimization algorithms. In this work, a new feature selection algorithm based on ARO meta-heuristic algorithm and DBSCAN clustering algorithm with automatic adjustment of input parameters (ARO-DBSCAN) is proposed. Using side algorithms to improve the performance of meta-heuristic algorithms can potentially cause getting stuck in local optima. The method proposed in the work has improved the performance of the ARO meta-heuristic for the feature selection problem without increasing the probability stuck in local optima. The use of DBSCAN clustering algorithm, which is based on density, increases the exploitation of ARO in the search space while maintaining its exploration. As a result, the performance of the ARO algorithm increases significantly in feature selection problems. The proposed algorithm is compared with 8 state-of-the-art feature selection algorithms on the UCI benchmark datasets and three real-world high-dimensional datasets. Result of experiments show the better performance of ARO-DBSCAN algorithm in the appropriate execution time. Also, in high-dimensional data, the proposed method is able to significantly reduce the number of dataset features. which makes the analysis of these datasets more efficient. The source code for the algorithm being proposed is accessible to the public on https://github.com/alihamdipour/ARO-DBSCAN.








Similar content being viewed by others
Data availability and access
The datasets analyzed during the current study are publicly available in the [UCI] repository, [https://archive.ics.uci.edu/datasets].
References
Fan C, Chen M, Wang X, Wang J, Huang B (2021) A review on data preprocessing techniques toward efficient and reliable knowledge discovery from building operational data. Front Energy Res 9:652801
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Abdulwahab HM, Ajitha S, Saif MAN (2022) Feature selection techniques in the context of big data: taxonomy and analysis. Appl Intell 52(12):13568–13613
Dokeroglu T, Deniz A, Kiziloz HE (2022) A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 494:269–296
Abramson D, Abela J (1991) A Parallel Genetic Algorithm for Solving the School Timetabling Problem. Citeseer, ???
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science, pp. 39–43. Ieee
Juneja M, Nagar S (2016) Particle swarm optimization algorithm and its parameters: A review. In: 2016 International Conference on Control, Computing, Communication and Materials (ICCCCM), pp. 1–5. IEEE
Jain M, Singh V, Rani A (2019) A novel nature-inspired algorithm for optimization: Squirrel search algorithm. Swarm Evol Comput 44:148–175
Abdollahzadeh B, Soleimanian Gharehchopogh F, Mirjalili S (2021) Artificial gorilla troops optimizer: a new nature-inspired metaheuristic algorithm for global optimization problems. Int J Intell Syst 36(10):5887–5958
Zervoudakis K, Tsafarakis S (2020) A mayfly optimization algorithm. Comput Ind Eng 145:106559
Kaveh A, Farhoudi N (2013) A new optimization method: Dolphin echolocation. Adv Eng Softw 59:53–70
Pan W-T (2012) A new fruit fly optimization algorithm: taking the financial distress model as an example. Knowl-Based Syst 26:69–74
Moosavi SHS, Bardsiri VK (2017) Satin bowerbird optimizer: A new optimization algorithm to optimize anfis for software development effort estimation. Eng Appl Artif Intell 60:1–15
Yang X-S (2012) Flower pollination algorithm for global optimization. In: Unconventional Computation and Natural Computation: 11th International Conference, UCNC 2012, Orléan, France, September 3-7, 2012. Proceedings 11, pp. 240–249. Springer
Koçer HG, Türkoğlu B, Uymaz SA (2023) Chaotic golden ratio guided local search for big data optimization. Eng Sci Technol Int J 41:101388
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Turkoglu B, Uymaz SA, Kaya E (2024) Chaotic artificial algae algorithm for solving global optimization with real-world space trajectory design problems. Arabian Journal for Science and Engineering, 1–28
Uymaz O, Turkoglu B, Kaya E, Asuroglu T (2024) A novel diversity guided galactic swarm optimization with feedback mechanism. IEEE Access
Uymaz SA, Tezel G, Yel E (2015) Artificial algae algorithm (aaa) for nonlinear global optimization. Appl Soft Comput 31:153–171
Muthiah-Nakarajan V, Noel MM (2016) Galactic swarm optimization: a new global optimization metaheuristic inspired by galactic motion. Appl Soft Comput 38:771–787
Wang L, Cao Q, Zhang Z, Mirjalili S, Zhao W (2022) Artificial rabbits optimization: A new bio-inspired meta-heuristic algorithm for solving engineering optimization problems. Eng Appl Artif Intell 114:105082
Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd 96:226–231
Oreski S, Oreski G (2014) Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst Appl 41(4):2052–2064
Wang Y, Chen X, Jiang W, Li L, Li W, Yang L, Liao M, Lian B, Lv Y, Wang S et al (2011) Predicting human microrna precursors based on an optimized feature subset generated by ga-svm. Genomics 98(2):73–78
Khammassi C, Krichen S (2017) A ga-lr wrapper approach for feature selection in network intrusion detection. Comput Security 70:255–277
Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recogn Lett 28(4):459–471
Chen L-F, Su C-T, Chen K-H, Wang P-C (2012) Particle swarm optimization for feature selection with application in obstructive sleep apnea diagnosis. Neural Comput Appl 21:2087–2096
Zhou Y, Lin J, Guo H (2021) Feature subset selection via an improved discretization-based particle swarm optimization. Appl Soft Comput 98:106794
Yang H, Du Q, Chen G (2012) Particle swarm optimization-based hyperspectral dimensionality reduction for urban land cover classification. IEEE J Sel Topics Appl Earth Obs Remote Sensing 5(2):544–554
Pramanik R, Sarkar S, Sarkar R (2022) An adaptive and altruistic pso-based deep feature selection method for pneumonia detection from chest x-rays. Appl Soft Comput 128:109464
Dorigo M, Di Caro G (1999) Ant colony optimization: a new meta-heuristic. In: Proceedings of the 1999 Congress on Evolutionary computation-CEC99 (Cat. No. 99TH8406), vol. 2, pp. 1470–1477. IEEE
Sivagaminathan RK, Ramakrishnan S (2007) A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Syst Appl 33(1):49–60
Kanan HR, Faez K (2008) An improved feature selection method based on ant colony optimization (aco) evaluated on face recognition system. Appl Math Comput 205(2):716–725
Aghdam MH, Ghasem-Aghaee N, Basiri ME (2009) Text feature selection using ant colony optimization. Expert Syst Appl 36(3):6843–6853
Karimi F, Dowlatshahi MB, Hashemi A (2023) Semiaco: a semi-supervised feature selection based on ant colony optimization. Expert Syst Appl 214:119130
Faramarzi A, Heidarinejad M, Stephens B, Mirjalili S (2020) Equilibrium optimizer: a novel optimization algorithm. Knowl-Based Syst 191:105190
Ahmed S, Ghosh KK, Mirjalili S, Sarkar R (2021) Aieou: automata-based improved equilibrium optimizer with u-shaped transfer function for feature selection. Knowl-Based Syst 228:107283
Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Geem ZW, Kim JH, Loganathan GV (2001) A new heuristic optimization algorithm harmony search. simulation 76(2):60–68
Ahmed S, Ghosh KK, Singh PK, Geem ZW, Sarkar R (2020) Hybrid of harmony search algorithm and ring theory-based evolutionary algorithm for feature selection. IEEE Access 8:102629–102645
Mafarja M, Qasem A, Heidari AA, Aljarah I, Faris H, Mirjalili S (2020) Efficient hybrid nature-inspired binary optimizers for feature selection. Cogn Comput 12:150–175
Pan H, Chen S, Xiong H (2023) A high-dimensional feature selection method based on modified gray wolf optimization. Appl Soft Comput 135:110031
Balochian S, Baloochian H (2019) Social mimic optimization algorithm and engineering applications. Expert Syst Appl 134:178–191
Ghosh KK, Singh PK, Hong J, Geem ZW, Sarkar R (2020) Binary social mimic optimization algorithm with x-shaped transfer function for feature selection. IEEE Access 8:97890–97906
Tharwat A, Gabel T (2020) Parameters optimization of support vector machines for imbalanced data using social ski driver algorithm. Neural Comput Appl 32:6925–6938
Alhussan AA, Abdelhamid AA, El-Kenawy E-SM, Ibrahim A, Eid MM, Khafaga DS, Ahmed AE (2023) A binary waterwheel plant optimization algorithm for feature selection. IEEE Access
Mirjalili S (2016) Sca: a sine cosine algorithm for solving optimization problems. Knowl-Based Syst 96:120–133
Takieldeen AE, El-kenawy E-SM, Hadwan M, Zaki RM (2022) Dipper throated optimization algorithm for unconstrained function and feature selection. Comput Mater Contin 72:1465–1481
Abdelhamid AA, El-Kenawy E-SM, Ibrahim A, Eid MM, Khafaga DS, Alhussan AA, Mirjalili S, Khodadadi N, Lim WH, Shams MY (2023) Innovative feature selection method based on hybrid sine cosine and dipper throated optimization algorithms. IEEE Access 11:79750–79776
Bertsimas D, Tsitsiklis J (1993) Simulated annealing. Statistical science 8(1):10–15
Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312
Khan K, Rehman SU, Aziz K, Fong S, Sarasvady S (2014) Dbscan: Past, present and future. In: The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014), pp. 232–238. IEEE
Schubert E, Sander J, Ester M, Kriegel HP, Xu X (2017) Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Trans Database Syst (TODS) 42(3):1–21
Lai W, Zhou M, Hu F, Bian K, Song Q (2019) A new dbscan parameters determination method based on improved mvo. Ieee Access 7:104085–104095
Sawant K (2014) Adaptive methods for determining dbscan parameters. Int J Innov Sci, Eng Technol 1(4):329–334
Starczewski A, Goetzen P, Er MJ (2020) A new method for automatic determining of the dbscan parameters. J Artif Int Soft Comput Res 10:209
Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) Optics: ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49–60
Liu P, Zhou D, Wu N (2007) Vdbscan: varied density based spatial clustering of applications with noise. In: 2007 International Conference on Service Systems and Service Management, pp. 1–4. IEEE
Author information
Authors and Affiliations
Contributions
The authors confirm contribution to the paper as follows: Ali Hamdipour and Abdolali Basiri contributed to study conception and design; Ali Hamdipour and Mostafa Zaare contributed to data collection; Ali Hamdipour, Abdolali Basiri, and Mostafa Zaare contributed to analysis and interpretation of results; Ali Hamdipour contributed to draft manuscript preparation; Abdolali Basiri, Mostafa Zaare, Seyedali Mirjalili supervised the study; Abdolali Basiri and Seyedali Mirjalili contributed to review and edit. All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
We only used publicly available data, ensuring privacy. No private or personal information was used. Therefore, formal ethical approval and consent were not necessary. We followed all legal guidelines for research ethics.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
Details of compressions
2.1 In term of classfication accuracy
The results obtained from BSMO feature selection algorithm in comparison with the ARO-DBSCAN with respect to classification accuracy show that in two cases (BreastEW, Exactly2) BSMO algorithm had higher classification accuracy, in five cases (Breastcancer, PenglungEW, Wine, Tic-tac-toe, Zoo) they had same classification accuracy, and in the remaining cases (14 item) the ARO-DBSCAN has obtained a higher classification accuracy. Therefore, in general, the ARO-DBSCAN has performed better than BSMO algorithm in classification accuracy.
The results obtained from WOASAT-2 feature selection algorithm in comparison with the ARO-DBSCAN with respect to classification accuracy show that in any case WOASAT-2 algorithm had higher classification accuracy, in five cases (PenglungEW, Wine, Tic-tac-toe, GSAUFM, E-mails) they had same classification accuracy, and in the remaining cases (16 item) the ARO-DBSCAN has obtained a higher classification accuracy. Therefore, in general, the ARO-DBSCAN has performed better than WOASAT-2 algorithm in classification accuracy.
The results obtained from SSD-LAHC feature selection algorithm in comparison with the ARO-DBSCAN with respect to classification accuracy show that in two cases (BreastEW, Exactly2) SSD-LAHC algorithm had higher classification accuracy, in 9 cases (PenglungEW, BreastCancer, Exactly, M-of-n, Tic-tac-toe, Vote, Wine, Zoo, ARCENE) they had same classification accuracy, and in the remaining cases (10 item) the ARO-DBSCAN has obtained a higher classification accuracy. Therefore, in general, the ARO-DBSCAN has performed better than SSD-LAHC algorithm in classification accuracy.
The results obtained from RTHS feature selection algorithm in comparison with the ARO-DBSCAN with respect to classification accuracy show that RTHS algorithm had not higher classification accuracy in any case, in four cases (Tic-tac-toe, PenglungEW, Wine, Zoo, CongressEW) they had same classification accuracy, and in the remaining cases (16 item) the ARO-DBSCAN has obtained a higher classification accuracy. Therefore, in general, the ARO-DBSCAN has performed better than RTHS algorithm in classification accuracy.
The results obtained from WOA-CM feature selection algorithm in comparison with the ARO-DBSCAN with respect to classification accuracy show that in one cases (Exactly2) WOA-CM algorithm had higher classification accuracy, in three cases (Tic-tac-toe, PenglungEW, Wine) they had same classification accuracy, and in the remaining cases (17 item) the ARO-DBSCAN has obtained a higher classification accuracy. Therefore, in general, the ARO-DBSCAN has performed better than WOA-CM algorithm in classification accuracy.
The results obtained from AAPSO feature selection algorithm in comparison with the ARO-DBSCAN with respect to classification accuracy show that in one cases (Exactly2) AAPSO algorithm had higher classification accuracy, in four cases (Tic-tac-toe, Zoo, Wine, PenglungEW) they had same classification accuracy, and in the remaining cases (16 item) the ARO-DBSCAN has obtained a higher classification accuracy. Therefore, in general, the ARO-DBSCAN has performed better than AAPSO algorithm in classification accuracy.
The results obtained from AIEOU feature selection algorithm in comparison with the ARO-DBSCAN in terms of classification accuracy show that in one cases (Exactly2) AIEOU algorithm had higher classification accuracy, in three cases (Tic-tac-toe, Wine, PenglungEW) they had same classification accuracy, and in the remaining cases (17 item) the ARO-DBSCAN has obtained a higher classification accuracy. Therefore, in general, the ARO-DBSCAN has performed better than AIEOU algorithm in terms of classification accuracy.
The results obtained from ASGW feature selection algorithm in comparison with the ARO-DBSCAN with respect to classification accuracy show that in the 21 cases the ARO-DBSCAN has obtained a higher classification accuracy. Therefore, in general, the ARO-DBSCAN has performed better than ASGW algorithm in classification accuracy.
2.2 In term of NSF
Table 7 presents the comparison results of the ARO-DBSCAN with satae-of-the-art algorithms from the aspect of NSF. The results obtained from BSMO feature selection algorithm in comparison with the ARO-DBSCAN with respect to NSF show that in four cases (Ionosphere, WaveformEW, KrVsKpEW, SpectEW) BSMO algorithm had lower NSF, in two cases (BreastCancer, Tic-tac-toe) they had same NSF, and in the remaining cases (15 item) the ARO-DBSCAN has obtained a lower NSF. Therefore, in general, the ARO-DBSCAN has performed better than BSMO algorithm in NSF.
The results obtained from SSD-LAHC feature selection algorithm in comparison with the ARO-DBSCAN with respect to NSF show that in five cases (Exactly, Zoo, Exactly2, WaveformEW, KrVsKpEW) SSD-LAHC algorithm had lower NSF, in four cases (BreastCancer, M-of-n, Tic-tac-toe, Exactly) they had same NSF, and in the remaining cases (12 item) the ARO-DBSCAN has obtained a lower NSF. Therefore, in general, the ARO-DBSCAN has performed better than SSD-LAHC algorithm in NSF.
The results obtained from WOA-CM feature selection algorithm in comparison with the ARO-DBSCAN with respect to NSF show that in four cases (WaveformEW, KrVsKpEW, Zoo, CongressEW) WOA-CM algorithm had lower NSF, in one case (Tic-tac-toe) they had same NSF, and in the remaining cases (16 item) the ARO-DBSCAN has obtained a lower NSF. Therefore, in general, the ARO-DBSCAN has performed better than WOA-CM algorithm in NSF.
The results obtained from RTHS feature selection algorithm in comparison with the ARO-DBSCAN with respect to NSF show that in four cases (BreastCancer, WaveformEW, KrVsKpEW) RTHS algorithm had lower NSF, in one case (Tic-tac-toe) they had same NSF, and in the remaining cases (17 item) the ARO-DBSCAN has obtained a lower NSF. Therefore, in general, the ARO-DBSCAN has performed better than RTHS algorithm in NSF.
The results obtained from AAPSO feature selection algorithm in comparison with the ARO-DBSCAN with respect to NSF show that in three cases (Exactly2, WaveformEW, KrVsKpEW) AAPSO algorithm had lower NSF, in two cases (BreastCancer, Tic-tac-toe) they had same NSF, and in the remaining cases (16 item) the ARO-DBSCAN has obtained a lower NSF. Therefore, in general, the ARO-DBSCAN has performed better than AAPSO algorithm in NSF.
The results obtained from AIEOU feature selection algorithm in comparison with the ARO-DBSCAN with respect to NSF show that in six cases (BreastCancer, Zoo, WaveformEW, KrVsKpEW, SpectEW, CongressEW) AIEOU algorithm had lower NSF, in one case (Tic-tac-toe) they had same NSF, and in the remaining cases (14 item) the ARO-DBSCAN has obtained a lower NSF. Therefore, in general, the ARO-DBSCAN has performed better than AIEOU algorithm in NSF.
The results obtained from ASGW feature selection algorithm in comparison with the ARO-DBSCAN with respect to NSF show that in three cases (Tic-tac-toe, WaveformEW, KrVsKpEW) ASGW algorithm had lower NSF and in the remaining cases (18 item) the ARO-DBSCAN has obtained a lower NSF. Therefore, in general, the ARO-DBSCAN has performed better than ASGW algorithm in NSF.
Rights and permissions
About this article
Cite this article
Hamdipour, A., Basiri, A., Zaare, M. et al. Artificial rabbits optimization algorithm with automatically DBSCAN clustering algorithm to similarity agent update for features selection problems. J Supercomput 81, 150 (2025). https://doi.org/10.1007/s11227-024-06606-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06606-8