Abstract
Dealing with high-dimensional data poses a significant challenge in machine learning. To address this issue, researchers have proposed feature selection as a viable solution. Due to the intricate search space involved in feature selection, swarm intelligence algorithms have gained popularity for their exceptional search capabilities. This study introduces a method called Clustering Probabilistic Particle Swarm Optimization (CPPSO) to revolutionize the traditional particle swarm optimization approach by incorporating probabilities to represent velocity and incorporating an elitism mechanism. Furthermore, CPPSO employs a clustering strategy based on the K-means algorithm, utilizing the Hamming distance to divide the population into two sub-populations to improve the performance. To assess CPPSO’s performance, a comparative analysis is conducted against seven existing algorithms using twenty diverse datasets. These datasets are all based on real-world problems. Fifteen of these are frequently used in feature selection research, while the remaining five comprise imbalanced datasets as well as multi-label datasets. The experimental results demonstrate the superiority of CPPSO across a range of evaluation criteria, surpassing the performance of the comparative algorithms on the majority of the datasets.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recognit 43(1):5–13. https://doi.org/10.1016/j.patcog.2009.06.009
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv (CSUR) 50(6):1–45. https://doi.org/10.1145/3136625
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1–4):131–156. https://doi.org/10.1016/S1088-467X(97)00008-5
Cai J, Luo J, Wang S, Yang S (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79. https://doi.org/10.1016/j.neucom.2017.11.077
Zhang Y, Gong D-W, Gao X-Z, Tian T, Sun X (2020) Binary differential evolution with self-learning for multi-objective feature selection. Inf Sci 507:67–85. https://doi.org/10.1016/j.ins.2019.08.040
Nie F, Wang Z, Tian L, Wang R, Li X (2020) Subspace sparse discriminative feature selection. IEEE Trans Cybern 52(6):4221–4233. https://doi.org/10.1109/TCYB.2020.3025205
Alsahaf A, Petkov N, Shenoy V, Azzopardi G (2022) A framework for feature selection through boosting. Expert Syst Appl 187:115895. https://doi.org/10.1016/j.eswa.2021.115895
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53(2):907–948. https://doi.org/10.1007/s10462-019-09682-y
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024
Dudani SA (1976) The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern 4:325–327. https://doi.org/10.1109/TSMC.1976.5408784
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9:293–300. https://doi.org/10.1023/A:1018628609742
Nguyen BH, Xue B, Zhang M (2020) A survey on swarm intelligence approaches to feature selection in data mining. Swarm Evol Comput 54:100663. https://doi.org/10.1016/j.swevo.2020.100663
Xue B, Zhang M, Browne WN (2012) Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Trans Cybern 43(6):1656–1671. https://doi.org/10.1109/TSMCB.2012.2227469
Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 100(9):1100–1103. https://doi.org/10.1109/T-C.1971.223410
Marill T, Green D (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9(1):11–17. https://doi.org/10.1109/TIT.1963.1057810
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125. https://doi.org/10.1016/0167-8655(94)90127-9
Zhang Y, Song X-F, Gong D (2017) A return-cost-based binary firefly algorithm for feature selection. Inf Sci 418:561–574. https://doi.org/10.1016/j.ins.2017.08.047
Rostami M, Berahmand K, Nasiri E, Forouzandeh S (2021) Review of swarm intelligence-based feature selection methods. Eng Appl Artif Intell 100:104210. https://doi.org/10.1016/j.engappai.2021.104210
Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312. https://doi.org/10.1016/j.neucom.2017.04.053
Wan Y, Wang M, Ye Z, Lai X (2016) A feature selection method based on modified binary coded ant colony optimization algorithm. Appl Soft Comput 49:248–258. https://doi.org/10.1016/j.asoc.2016.08.011
AlFarraj O, AlZubi A, Tolba A (2019) Optimized feature selection algorithm based on fireflies with gravitational ant colony algorithm for big data predictive analytics. Neural Comput Appl 31:1391–1403. https://doi.org/10.1007/s00521-018-3612-0
Al-Thanoon NA, Algamal ZY, Qasim OS (2021) Feature selection based on a crow search algorithm for big data classification. Chemom Intell Lab Syst 212:104288. https://doi.org/10.1016/j.chemolab.2021.104288
Abualigah LM, Khader AT, Hanandeh ES (2018) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466. https://doi.org/10.1016/j.jocs.2017.07.018
Huda RK, Banka H (2019) Efficient feature selection and classification algorithm based on PSO and rough sets. Neural Comput Appl 31:4287–4303. https://doi.org/10.1007/s00521-017-3317-9
Amoozegar M, Minaei-Bidgoli B (2018) Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism. Expert Syst Appl 113:499–514. https://doi.org/10.1016/j.eswa.2018.07.013
Li J, Yang L, Yi J, Yang H, Todo Y, Gao S (2022) A simple but efficient ranking-based differential evolution. IEICE Trans Inf Syst 105(1):189–192. https://doi.org/10.1587/transinf.2021EDL8053
Zhang Y, Gao S, Cai P, Lei Z, Wang Y (2023) Information entropy-based differential evolution with extremely randomized trees and LightGBM for protein structural class prediction. Appl Soft Comput 136:110064. https://doi.org/10.1016/j.asoc.2023.110064
Fong S, Wong R, Vasilakos AV (2015) Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans Serv Comput 9(1):33–45. https://doi.org/10.1109/TSC.2015.2439695
Lei Z, Gao S, Wang Y, Yu Y, Guo L (2022) An adaptive replacement strategy-incorporated particle swarm optimizer for wind farm layout optimization. Energy Convers Manag 269:116174. https://doi.org/10.1016/j.enconman.2022.116174
Ibrahim RA, Ewees AA, Oliva D, Abd Elaziz M, Lu S (2019) Improved salp swarm algorithm based on particle swarm optimization for feature selection. J Ambient Intell Humaniz Comput 10:3155–3169. https://doi.org/10.1007/s12652-018-1031-9
Paul D, Jain A, Saha S, Mathew J (2021) Multi-objective PSO based online feature selection for multi-label classification. Knowl Based Syst 222:106966. https://doi.org/10.1016/j.knosys.2021.106966
Xue Y, Xue B, Zhang M (2019) Self-adaptive particle swarm optimization for large-scale feature selection in classification. ACM Trans Knowl Discov Data (TKDD) 13(5):1–27. https://doi.org/10.1145/3340848
Xue Y, Cai X, Neri F (2022) A multi-objective evolutionary algorithm with interval based initialization and self-adaptive crossover operator for large-scale feature selection in classification. Appl Soft Comput 127:109420. https://doi.org/10.1016/j.asoc.2022.109420
Hu Y, Zhang Y, Gong D (2020) Multiobjective particle swarm optimization for feature selection with fuzzy cost. IEEE Trans Cybern 51(2):874–888. https://doi.org/10.1109/TCYB.2020.3015756
Xue Y, Tang Y, Xu X, Liang J, Neri F (2021) Multi-objective feature selection with missing data in classification. IEEE Trans Emerg Top Comput Intell 6(2):355–364. https://doi.org/10.1109/TETCI.2021.3074147
Du K-L, Swamy M, Du K-L, Swamy M (2016) Particle swarm optimization. In: Search and optimization by metaheuristics: techniques and algorithms inspired by nature, pp 153–173. https://doi.org/10.1007/978-3-319-41192-7_9
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recognit 36(2):451–461. https://doi.org/10.1016/S0031-3203(02)00060-2
Guha R, Ghosh M, Chakrabarti A, Sarkar R, Mirjalili S (2020) Introducing clustering based population in binary gravitational search algorithm for feature selection. Appl Soft Comput 93:106341. https://doi.org/10.1016/j.asoc.2020.106341
Pramanik R, Sarkar S, Sarkar R (2022) An adaptive and altruistic PSO-based deep feature selection method for pneumonia detection from chest X-rays. Appl Soft Comput 128:109464. https://doi.org/10.1016/j.asoc.2022.109464
Alwajih R, Abdulkadir SJ, Al Hussian H, Aziz N, Al-Tashi Q, Mirjalili S, Alqushaibi A (2022) Hybrid binary whale with harris hawks for feature selection. Neural Comput Appl 34(21):19377–19395. https://doi.org/10.1007/s00521-022-07522-9
Liu H, Zhang X-W, Tu L-P (2020) A modified particle swarm optimization using adaptive strategy. Expert Syst Appl 152:113353. https://doi.org/10.1016/j.eswa.2020.113353
Tran B, Xue B, Zhang M (2018) Variable-length particle swarm optimization for feature selection on high-dimensional classification. IEEE Trans Evol Comput 23(3):473–487. https://doi.org/10.1109/TEVC.2018.2869405
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453. https://doi.org/10.1016/j.asoc.2017.11.006
Wang Z, Gao S, Zhang Y, Guo L (2022) Symmetric uncertainty-incorporated probabilistic sequence-based ant colony optimization for feature selection in classification. Knowl Based Syst 256:109874. https://doi.org/10.1016/j.knosys.2022.109874
Khalid AM, Hamza HM, Mirjalili S, Hosny KM (2022) BCOVIDOA: a novel binary coronavirus disease optimization algorithm for feature selection. Knowl Based Syst 248:108789. https://doi.org/10.1016/j.knosys.2022.108789
Fernández A, García S, Jesus MJ, Herrera F (2008) A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst 159(18):2378–2398. https://doi.org/10.1016/j.fss.2007.12.023
Gonçalves EC, Plastino A, Freitas AA (2013) A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: 2013 IEEE 25th international conference on tools with artificial intelligence. IEEE, pp 469–476. https://doi.org/10.1109/ICTAI.2013.76
Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 workshop on mining multidimensional data (MMD’08), vol 21, pp 53–59
Maldonado S, López J (2018) Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification. Appl Soft Comput 67:94–105
Acknowledgements
This research was partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI under Grant JP22H03643, Japan Science and Technology Agency (JST) Support for Pioneering Research Initiated by the Next Generation (SPRING) under Grant JPMJSP2145, and JST through the Establishment of University Fellowships towards the Creation of Science Technology Innovation under Grant JPMJFS2115.
Author information
Authors and Affiliations
Contributions
JG: Conceptualization, Writing- original draft, Methodology, Software. ZW: Methodology, Software, Supervision, Writing- review & editing. ZL: Conceptualization, Supervision,Writing- review & editing. R-LW: Conceptualization, Supervision,Writing- review & editing. ZW: Conceptualization, Supervision,Writing- review & editing. SG: Conceptualization, Supervision, Writing- review & editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors whose names are listed immediately below certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript. The authors have no relevant financial or non-financial interests to disclose
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, J., Wang, Z., Lei, Z. et al. Feature selection with clustering probabilistic particle swarm optimization. Int. J. Mach. Learn. & Cyber. 15, 3599–3617 (2024). https://doi.org/10.1007/s13042-024-02111-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-024-02111-9