Abstract
Online streaming feature selection plays an important role in dealing with multi-dimensional data problems. Many online streaming feature selection algorithms have been combined with evolutionary algorithms (EA) and play an important role, however, most of them use single-objective optimization which has some limitations. Meanwhile, they ignore the interaction between features. The combination of features with each other may generates higher relevance. Therefore, this paper proposes a new online group feature selection algorithm PSO-NRS by fusing particle swarm optimization (PSO) algorithm and neighborhood rough set theory (NRS). PSO-NRS is able to select the set of features that are highly correlated with labels by combining features randomly. Using NRS for online feature selection does not require any domain knowledge, which makes PSO-NRS generalize better and can handle different types of data. PSO-NRS applies two layers of filtering for online feature selection. In the first filtering layer, two objective functions are designed and multi-objective optimization by particle swarm is used to select the set of features with the highest relevance. In the second filtering layer, a search strategy is defined using a rough set-based evaluation method to complete the final feature selection. The interactions between features are considered and redundant features are removed during the two filtering layers. Finally, PSO-NRS is experimented on 14 different types of datasets and compared with six state-of-the-art online feature selection algorithms to strongly validate the effectiveness and generalization of this algorithm.
Similar content being viewed by others
References
Agrawal P, Abutarboush H F, Ganesh T, Mohamed A W (2021) Metaheuristic algorithms on feature selection: a survey of one decade of research (2009-2019). IEEE Access 9:26766–26791. https://doi.org/10.1109/ACCESS.2021.3056407
Bommert A, Welchowski T, Schmid M, Rahnenführer J (2022) Benchmark of filter methods for feature selection in high-dimensional gene expression survival data. Brief Bioinform 23(1):354. https://doi.org/10.1093/bib/bbab354
Omuya E O, Okeyo G O, Kimwele M W (2021) Feature selection for classification using principal component analysis and information gain. Expert Syst Appl 174:114765. https://doi.org/10.1016/j.eswa.2021.114765
Rahmaninia M, Moradi P (2017) Osfsmi: online stream feature selection method based on mutual information. Appl Soft Comput 1568494617305161. https://doi.org/10.1016/j.asoc.2017.08.034
Wu X, Yu K, Ding W, Wang H, Zhu X (2013) Online feature selection with streaming features. IEEE Trans Pattern Anal Mach Intell 35(5):1178–1192. https://doi.org/10.1109/TPAMI.2012.197
Peng Z, Hu X, Li P, Wu X (2018) Online streaming feature selection using adapted neighborhood rough set. Inf Sci 481. https://doi.org/10.1016/j.ins.2018.12.074
Aharoni E, Rosset S (2015) Generalized alpha investing: definitions, optimality results, and application to public databases. J R Stat Soc 76(4):771–794. https://doi.org/10.1111/rssb.12048
Qing-Hua H U, Da-Ren Y U, Xie Z X (2008) Numerical attribute reduction based on neighborhood granulation and rough approximation. J Softw https://doi.org/10.3724/SP.J.1001.2008.00640
A duplication analysis-based evolutionary algorithm for biobjective feature selection. IEEE Trans Evol Comput (2021). https://doi.org/10.1109/TEVC.2020.3016049
Song X -F, Zhang Y, Gong D -W, Gao X -Z (2021) A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3061152
Paul D, Jain A, Saha S, Mathew J (2021) Multi-objective pso based online feature selection for multi-label classification. Knowl-Based Syst 222(1):106966. https://doi.org/10.1016/j.knosys.2021.106966https://doi.org/10.1016/j.knosys.2021.106966
Kui Y U, Xindong W U, Ding W, Pei J (2017) Scalable and accurate online feature selection for big data. ACM Trans Knowl Discov Data 11(2):16–11639. https://doi.org/10.1145/2976744
You D, Wu X, Shen L, Deng S, Chen Z, Ma C, Lian Q (2019) Online feature selection for streaming features using self-adaption sliding-window sampling. IEEE Access 1–1. https://doi.org/10.1109/ACCESS.2019.2894121https://doi.org/10.1109/ACCESS.2019.2894121
Bensaid F, Alimi AM (2020) Online feature selection system for big data classification based on multi-objective automated negotiation. Pattern Recognit 110(1):107629. https://doi.org/10.1016/j.patcog.2020.107629https://doi.org/10.1016/j.patcog.2020.107629
Lin Y, Hu Q, Liu J, Li J, Wu X (2017) Streaming feature selection for multi-label learning based on fuzzy mutual information. IEEE Trans Fuzzy Syst PP(99):1–1. https://doi.org/10.1109/TFUZZ.2017.2735947https://doi.org/10.1109/TFUZZ.2017.2735947
(2018) Online multi-label group feature selection. Knowl-Based Syst 143:42–57. https://doi.org/10.1016/j.knosys.2017.12.008
Li Y, Lin Y, Liu J, Weng W, Shi Z, Wu S (2018) Feature selection for multi-label learning based on kernelized fuzzy rough sets. Neurocomputing 318:271–286. https://doi.org/10.1016/j.neucom.2018.08.065https://doi.org/10.1016/j.neucom.2018.08.065
Bania R K, Halder A (2021) R-hefs: rough set based heterogeneous ensemble feature selection method for medical data classification. Artif Intell Med 114:102049. https://doi.org/10.1016/j.artmed.2021.102049https://doi.org/10.1016/j.artmed.2021.102049
Mohtashami M, Eftekhari M (2018) Using a novel merit for feature selection based on rough set theory. In: 2018 6th Iranian joint congress on fuzzy and intelligent systems (CFIS). https://doi.org/10.1109/CFIS.2018.8336632
Sun L, Zhang J, Ding W, Xu J (2022) Mixed measure-based feature selection using the fisher score and neighborhood rough sets. Appl Intell 1–25. https://doi.org/10.1007/s10489-021-03142-3
Yang X, Chen H, Li T, Wan J, Sang B (2021) Neighborhood rough sets with distance metric learning for feature selection. Knowl-Based Syst 107076:224. https://doi.org/10.1016/j.knosys.2021.107076https://doi.org/10.1016/j.knosys.2021.107076
Zhou P, Li P, Zhao S, Zhang Y (2021) Online early terminated streaming feature selection based on rough set theory. Appl Soft Comput 113:107993. https://doi.org/10.1016/j.asoc.2021.107993
Peng Z A, Xh A, Pl A, Xw B (2019) Ofs-density: a novel online streaming feature selection method - sciencedirect. Pattern Recogn 86:48–61. https://doi.org/10.1016/j.patcog.2018.08.009
Liu J, Lin Y, Li Y, Weng W, Wu S (2018) Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recognit. https://doi.org/10.1016/j.patcog.2018.07.021
Dai L, Du G, Zhang J, Li C, Li S (2020) Joint multilabel classification and feature selection based on deep canonical correlation analysis. Concurr Comput Pract Exp 32(23). https://doi.org/10.1002/cpe.5864https://doi.org/10.1002/cpe.5864
Sun L, Yin T, Ding W, Qian Y, Xu J (2020) Multilabel feature selection using ml-relieff and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci 537:401–424. https://doi.org/10.1016/j.ins.2020.05.102
Fan Y, Liu J, Liu P, Du Y, Lan W, Wu S (2021) Manifold learning with structured subspace for multi-label feature selection. Pattern Recogn 120:108169. https://doi.org/10.1016/j.patcog.2021.108169
Song X -F, Zhang Y, Guo Y -N, Sun X -Y, Wang Y -L (2020) Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data. IEEE Trans Evol Comput 24(5):882–895. https://doi.org/10.1109/TEVC.2020.2968743https://doi.org/10.1109/TEVC.2020.2968743
Zhang Y, Li H G, Wang Q, Peng C (2019) A filter-based bare-bone particle swarm optimization algorithm for unsupervised feature selection. Appl Intell. https://doi.org/10.1007/s10489-019-01420-9https://doi.org/10.1007/s10489-019-01420-9
Baruah H S, Thakur J, Sarmah S, Hoque N (2020) A feature selection method using pso-mi. In: 2020 International conference on computational performance evaluation (comPE). https://doi.org/10.1109/ComPE49325.2020.9200034, pp 280–284
Pedrycz W, Miao D, Li F (2017) Granular multi-label feature selection based on mutual information. Pattern Recognition the Journal of the Pattern Recognition Society. https://doi.org/10.1016/j.patcog.2017.02.025https://doi.org/10.1016/j.patcog.2017.02.025
Hatami M, Mehrmohammadi P, Moradi P (2020) A multi-label feature selection based on mutual information and ant colony optimization. In: 2020 28th Iranian conference on electrical engineering (ICEE). https://doi.org/10.1109/ICEE50131.2020.9260852
Ah A, Mbd B, Np C (2021) A pareto-based ensemble of feature selection algorithms. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2021.115130https://doi.org/10.1016/j.eswa.2021.115130
Han F, Chen W -T, Ling Q -H, Han H (2021) Multi-objective particle swarm optimization with adaptive strategies for feature selection. Swarm Evol Comput 62:100847. https://doi.org/10.1016/j.swevo.2021.100847https://doi.org/10.1016/j.swevo.2021.100847
Srinivas N, Deb K (1994) Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248. https://doi.org/10.1162/evco.1994.2.3.221
Yue C, Suganthan P N, Liang J, Qu B, Yu K, Zhu Y, Yan L (2021) Differential evolution using improved crowding distance for multimodal multiobjective optimization. Swarm Evol Comput 62:100849. https://doi.org/10.1016/j.swevo.2021.100849
Feng J, Gong Z (2022) A novel feature selection method with neighborhood rough set and improved particle swarm optimization. IEEE Access 10:33301–33312. https://doi.org/10.1109/ACCESS.2022.3162074https://doi.org/10.1109/ACCESS.2022.3162074
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30. https://doi.org/10.1007/s10846-005-9016-2
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant No.51975505 and HeBei Natural Science Foundation under Grant No.G2021203010 & No.F2021203038. Meanwhile, it was supported by Key Laboratory of Robotics and Intelligent Equipment of Guangdong Regular Institutions of Higher EducationGrant No.2017KSYS009.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Ze Liu, Dianlong You, Weiwei Pan, Junjie Zhao and Yefan Cao contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liang, S., Liu, Z., You, D. et al. PSO-NRS: an online group feature selection algorithm based on PSO multi-objective optimization. Appl Intell 53, 15095–15111 (2023). https://doi.org/10.1007/s10489-022-04275-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04275-9