Abstract
In the realm of machine learning, feature selection emerges as a prevalent data preprocessing technique, playing a crucial role in enhancing model performance across diverse downstream tasks such as fault diagnosis, biological recognition, and object detection. Nevertheless, the challenge of incomplete supervision, stemming from limited labeled data availability, poses a formidable obstacle in acquiring the optimal feature subset for model input. To address the problem that label scarcity may deteriorate the feature evaluation and selection, we introduce a novel semi-supervised feature selection algorithm termed Semi2MNR integrating the principles of Minimum Neighborhood Redundancy and Maximum Neighborhood Relevancy. Firstly, k-nearest neighborhood granulation is leveraged to construct a collection of neighborhood uncertainty measures from the perspective of information theory. Then, the neighborhood mutual information is expressed to assess the feature-to-label relevance based on labeled samples and feature-to-feature redundance based on unlabeled samples. Finally, as the evaluation criterion of min-neighborhood-redundancy and max-neighborhood-relevancy is constrained, a forward sequential searching algorithm is devised to identify the min-redundant and max-relevant features. The empirical findings from our experiments on 12 UCI data sets unequivocally demonstrate the superiority of Semi2MNR in the presence of partially labeled data with varying labeling rates. Comparative analysis against other feature selection algorithms suggests that CART, KNN, and SVM classifiers fed with features selected by Semi2MNR consistently yield optimal accuracies.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of Data and Materials
The data that support the findings of this study are openly available in UCI Machine Learning Repository at https://archive.ics.uci.edu.
References
Liu KY, Li TR, Yang XB, Ju HR, Yang X, Liu D (2022) Hierarchical neighborhood entropy based multi-granularity attribute reduction with application to gene prioritization. Int J Approx Reason 148:57–67. https://doi.org/10.1016/j.ijar.2022.05.011
Ju HR, Shan TT, Ding WP, Liu KY, Khan MJ, Huang JS, Yang XB (2024) BiFuG2-Spark: bi-directional fuzzy granular-cabin parallel attribute reduction accelerator with granular-group collaboration. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2024.3392328
Jiang ZH, Liu KY, Song JJ, Yang XB, Li JH, Qian YH (2021) Accelerator for crosswise computing reduct. Appl Soft Comput 98:106740. https://doi.org/10.1016/j.asoc.2020.106740
Xu WH, Yuan KH, Li WT, Ding WP (2023) An emerging fuzzy feature selection method using composite entropy-based uncertainty measure and data distribution. IEEE Trans Emerg Top Comput Intell 7(1):76–88. https://doi.org/10.1109/TETCI.2022.3171784
Sang BB, Xu WH, Chen HM, Li TR (2023) Active antinoise fuzzy dominance rough feature selection using adaptive \(k\)-nearest neighbors. IEEE Trans Fuzzy Syst 31(11):3944–3958. https://doi.org/10.1109/TFUZZ.2023.3272316
Dokeroglu T, Deniz A, Kiziloz HE (2022) A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 494:269–296 https://doi.org/10.1016/j.neucom.2022.04.083
Chen Y, Liu, KY, Song JJ, Fujita H, Yang XB, Qian YH (2020) Attribute group for attribute reduction. Inf Sci 535:64–80. https://doi.org/10.1016/j.ins.2020.05.010
Zhang JD, Liu KY, Yang XB, Ju HR, Xu SP (2023) Multi-label learning with relief-based label-specific feature selection. Appl Intell 53:18517–18530. https://doi.org/10.1007/s10489-022-04350-1
Nssibi M, Manita G, Korbaa O (2023) Advances in nature-inspired metaheuristic optimization for feature selection problem: a comprehensive survey. Comput Sci Rev 49:100559. https://doi.org/10.1016/j.cosrev.2023.100559
Zhou HF, Wang XQ, Zhu RR (2022) Feature selection based on mutual information with correlation coefficient. Appl Intell 52:5457–5474. https://doi.org/10.1007/s10489-021-02524-x
Ang JC, Mirzal A, Haron H, Hamed HNA (2016) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinf 13(5):971–989. https://doi.org/10.1109/TCBB.2015.2478454
Zhang R, Nie FP, Li XL, Wei X (2019) Feature selection with multi-view data: a survey. Inf Fusion 50:158–167. https://doi.org/10.1016/j.inffus.2018.11.019
Solorio-Fernandez S, Carrasco-Ochoa JA, Martinez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53:907–948. https://doi.org/10.1007/s10462-019-09682-y
Xu WH, Huang M, Jiang ZY, Qian YH (2023) Graph-based unsupervised feature selection for interval-valued information system. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2023.3263684
Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recognit 64:141–158. https://doi.org/10.1016/j.patcog.2016.11.003
Yin ZY, Yang XB, Wang PX, Yu HL, Qian YH (2023) Ensemble selector mixed with pareto optimality to feature reduction. Appl Soft Comput 148:110877. https://doi.org/10.1016/j.asoc.2023.110877
Guo DD, Xu WH, Qian YH, Ding WP (2023) M-fccl: memory-based concept-cognitive learning for dynamic fuzzy data classification and knowledge fusion. Inf Fusion 100:101962. https://doi.org/10.1016/j.inffus.2023.101962
Xu WH, Guo DD, Qian YH, Ding WP (2023) Two-way concept-cognitive learning method: a fuzzy-based progressive learning. IEEE Trans Fuzzy Syst 31(6):1885–1899. https://doi.org/10.1109/TFUZZ.2022.3216110
Xu WH, Guo DD, Mi JS, Qian YH, Zheng KY, Ding WP (2023) Two-way concept-cognitive learning via concept movement viewpoint. IEEE Trans Neural Netw Learn Syst 34(10):6798–6812. https://doi.org/10.1109/TNNLS.2023.3235800
Guo DD, Xu WH, Qian YH, Ding WP (2024) Fuzzy-granular concept-cognitive learning via three-way decision: performance evaluation on dynamic knowledge discovery. IEEE Trans Fuzzy Syst 32(3):1409–1423. https://doi.org/10.1109/TFUZZ.2023.3325952
Xu WH, Chen YQ (2022) Multi-attention concept-cognitive learning model: a perspective from conceptual clustering. Knowl-Based Syst 252:109472. https://doi.org/10.1016/j.knosys.2022.109472
Liu KY, Li TR, Yang XB, Yang Y, Liu D (2022) Neighborhood rough set based ensemble feature selection with cross-class sample granulation. Appl Soft Comput 131:109747. https://doi.org/10.1016/j.asoc.2022.109747
Ju HR, Ding WP, Yang XB, Gu PP (2023) Bi-directional adaptive neighborhood rough sets based attribute subset selection. Int J Approx Reason 160:108966. https://doi.org/10.1016/j.ijar.2023.108966
Ba J, Wang PX, Yang XB, Yu HL, Yu DJ (2023) GLEE: a granularity filter for feature selection. Eng Appl Artif Intell 122:106080. https://doi.org/10.1016/j.engappai.2023.106080
Ba J, Liu KY, Yang XB, Qian YH (2023) GIFT: granularity over specific-class for feature selection. Artif Intell Rev. https://doi.org/10.1007/s10462-023-10499-z
An S, Guo XY, Wang CZ, Cuo G, Dai JH (2023) A soft neighborhood rough set model and its applications. Inf Sci 624:185–199. https://doi.org/10.1016/j.ins.2022.12.074
Jiang ZH, Liu KY, Yang XB, Yu HL, Fujita H, Qian YH (2020) Accelerator for supervised neighborhood based attribute reduction. Int J Approx Reason 119:122–150. https://doi.org/10.1016/j.ijar.2019.12.013
Ju HR, Yin T, Huang JS, Ding WP, Yang XB (2023) Sparse mutual granularity-based feature selection and its application of schizophrenia patients. IEEE Trans Emerg Top Comput Intell. https://doi.org/10.1109/TETCI.2023.3314548
Liu KY, Li TR, Yang XB, Ju HR, Yang X, Liu D (2023) Feature selection in threes: neighborhood relevancy, redundancy, and granularity interactivity. Appl Soft Comput 146:110679.https://doi.org/10.1016/j.asoc.2023.110679
Liu JH, Lin YJ, Du JX, Zhang HB, Chen ZY, Zhang J (2023) ASFS: a novel streaming feature selection for multi-label data based on neighborhood rough set. Appl Intell 53:1707–1724.https://doi.org/10.1007/s10489-022-03366-x
Yin TY, Chen HM, Yuan Z, Li TR, Liu KY (2023) Noise-resistant multilabel fuzzy neighborhood rough sets for feature subset selection. Inf Sci 621:200–226. https://doi.org/10.1016/j.ins.2022.11.060
Luo C, Cao Q, Li TR, Chen HM, Wang SZ (2023) MapReduce accelerated attribute reduction based on neighborhood entropy with Apache Spark. Expert Syst Appl 211:118554. https://doi.org/10.1016/j.eswa.2022.118554
Hu M, Guo YT, Chen DG, Tsang ECC, Zhang QS (2023) Attribute reduction based on neighborhood constrained fuzzy rough sets. Knowl-Based Syst 274:110632. https://doi.org/10.1016/j.knosys.2023.110632
Dai JH, Hu QH, Zhang JH, Hu H, Zheng NG (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cyber 47(9):2460–2471. https://doi.org/10.1109/TCYB.2016.2636339
Chen H, Chen HM, Li WY, Li TR (2023) Semi-supervised feature selection based on pairwise constraint-guided dual space latent representation learning and double sparse graphs discriminant. Appl Intell 53:12288–12307. https://doi.org/10.1007/s10489-022-04040-y
An S, Zhang MR, Wang CZ, Ding WP (2023) Robust fuzzy rough approximations with knn granules for semi-supervised feature selection. Fuzzy Sets Syst 461:108476. https://doi.org/10.1016/j.fss.2023.01.011
Dai JH, Huang WY, Wang WS, Zhang CC (2023) Semi-supervised attribute reduction based on label distribution and label irrelevance. Inf Fusion 100:101951. https://doi.org/10.1016/j.inffus.2023.101951
Ren JT, Qiu ZY, Fan W, Cheng H, Yu PS (2008) Forward semi-supervised feature selection. In: Pacific-Asia conference on knowledge discovery and data mining, pp 970–976. https://doi.org/10.1007/978-3-540-68125-0_101
Liu KY, Li TR, Yang XB, Chen HM, Wang J, Deng ZX (2023) SemiFREE: semisupervised feature selection with fuzzy relevance and redundancy. IEEE Trans Fuzzy Syst 31(10):3384–3396
Liu KY, Yang XB, Yu HL, Mi JS, Wang PX, Chen XJ (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowl-Based Syst 165:282–296. https://doi.org/10.1016/j.knosys.2018.11.034
Guo ZJ, Shen Y, Yang T, Li YJ, Deng YF, Qian YH (2024) Semi-supervised feature selection based on fuzzy related family. Inf Sci 652:119660. https://doi.org/10.1016/j.ins.2023.119660
Zhang H, Gong MG, Nie FP, Li XL (2022) Unified dual-label semi-supervised learning with top-\(k\) feature selection. Neurocomputing 501:875–888. https://doi.org/10.1016/j.neucom.2022.05.090
Shi D, Zhu L, Li JJ, Cheng ZY, Liu ZG (2023) Binary label learning for semi-supervised feature selection. IEEE Trans Knowl Data Eng 35(3):2299–2312. https://doi.org/10.1109/TKDE.2021.3109243
Hu QH, Zhang L, Zhang D, Pan W, An S, Pedrycz W (2011) Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst Appl 38:10737–10750. https://doi.org/10.1016/j.eswa.2011.01.023
Hu M, Tsang ECC, Guo YT, Xu WH (2022) Fast and robust attribute reduction based on the separability in fuzzy decision systems. IEEE Trans Cyber 52(6):5559–5572. https://doi.org/10.1109/TCYB.2020.3040803
Zhang PF, Li TR, Yuan Z, Luo C, Wang GQ, Liu J, Du SD (2022) A data-level fusion model for unsupervised attribute selection in multi-source homogeneous data. Inf Fusion 80:87–103. https://doi.org/10.1016/j.inffus.2021.10.017
Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32:656–665. https://doi.org/10.1016/j.patrec.2010.12.014
Zhao JD, Lu K, He XF (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71:1842–1849. https://doi.org/10.1016/j.neucom.2007.06.014
Pang QQ, Zhang L (2020) Semi-supervised neighborhood discrimination index for feature selection. Knowl-Based Syst 204:106224. https://doi.org/10.1016/j.knosys.2020.106224
Liu KY, Li TR, Yang XB, Yang X, Liu D, Zhang PF, Wang J (2022) Granular cabin: an efficient solution to neighborhood learning in big data. Inf Sci 583:189–201. https://doi.org/10.1016/j.ins.2021.11.034
Acknowledgements
The authors would like to thank the editor and anonymous reviewers for their insightful and constructive comments, which greatly improved the quality of this article. This work is supported by the National Science Foundation of China (No. 62076111) and the National College Students’ Innovation and Entrepreneurship Training Plan Program (No. 202410289027Z).
Funding
This study was funded by the National Science Foundation of China (No. 62076111) and the National College Students’ Innovation and Entrepreneurship Training Plan Program (No. 202410289027Z).
Author information
Authors and Affiliations
Contributions
Conceptualization: Damo Qian, Keyu Liu, Shiming Zhang; Methodology: Keyu Liu; Formal analysis and investigation: Damo Qian, Keyu Liu, Shiming Zhang; Writing - original draft preparation: Damo Qian; Writing - review and editing: Keyu Liu, Xibei Yang; Funding acquisition: Xibei Yang; Resources: Keyu Liu, Shiming Zhang; Supervision: Keyu Liu, Xibei Yang.
Corresponding author
Ethics declarations
Ethical and Informed Consent for Data Used
The data used in this study were from public databases, which are openly accessible and do not contain any personally identifiable information. Therefore, the issue of informed consent does not arise. All data handling procedures adhered to ethical guidelines for research.
Competing Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qian, D., Liu, K., Zhang, S. et al. Semi-supervised feature selection by minimum neighborhood redundancy and maximum neighborhood relevancy. Appl Intell 54, 7750–7764 (2024). https://doi.org/10.1007/s10489-024-05578-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05578-9