Dealing with high-dimensional data poses a significant challenge in machine learning. To address this issue, researchers have proposed feature selection as a viable solution. Due to the intricate search space involved in feature selection, swarm intelligence algorithms have gained popularity for their exceptional search capabilities. This study introduces a method called Clustering Probabilistic Particle Swarm Optimization (CPPSO) to revolutionize the traditional particle swarm optimization approach by incorporating probabilities to represent velocity and incorporating an elitism mechanism. Furthermore, CPPSO employs a clustering strategy based on the K-means algorithm, utilizing the Hamming distance to divide the population into two sub-populations to improve the performance. To assess CPPSO’s performance, a comparative analysis is conducted against seven existing algorithms using twenty diverse datasets. These datasets are all based on real-world problems. Fifteen of these are frequently used in feature selection research, while the remaining five comprise imbalanced datasets as well as multi-label datasets. The experimental results demonstrate the superiority of CPPSO across a range of evaluation criteria, surpassing the performance of the comparative algorithms on the majority of the datasets.

The data that support the findings of this study are available from the corresponding author upon reasonable request.
