Skip to main content
Log in

Semi-supervised feature selection by minimum neighborhood redundancy and maximum neighborhood relevancy

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In the realm of machine learning, feature selection emerges as a prevalent data preprocessing technique, playing a crucial role in enhancing model performance across diverse downstream tasks such as fault diagnosis, biological recognition, and object detection. Nevertheless, the challenge of incomplete supervision, stemming from limited labeled data availability, poses a formidable obstacle in acquiring the optimal feature subset for model input. To address the problem that label scarcity may deteriorate the feature evaluation and selection, we introduce a novel semi-supervised feature selection algorithm termed Semi2MNR integrating the principles of Minimum Neighborhood Redundancy and Maximum Neighborhood Relevancy. Firstly, k-nearest neighborhood granulation is leveraged to construct a collection of neighborhood uncertainty measures from the perspective of information theory. Then, the neighborhood mutual information is expressed to assess the feature-to-label relevance based on labeled samples and feature-to-feature redundance based on unlabeled samples. Finally, as the evaluation criterion of min-neighborhood-redundancy and max-neighborhood-relevancy is constrained, a forward sequential searching algorithm is devised to identify the min-redundant and max-relevant features. The empirical findings from our experiments on 12 UCI data sets unequivocally demonstrate the superiority of Semi2MNR in the presence of partially labeled data with varying labeling rates. Comparative analysis against other feature selection algorithms suggests that CART, KNN, and SVM classifiers fed with features selected by Semi2MNR consistently yield optimal accuracies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of Data and Materials

The data that support the findings of this study are openly available in UCI Machine Learning Repository at https://archive.ics.uci.edu.

References

  1. Liu KY, Li TR, Yang XB, Ju HR, Yang X, Liu D (2022) Hierarchical neighborhood entropy based multi-granularity attribute reduction with application to gene prioritization. Int J Approx Reason 148:57–67. https://doi.org/10.1016/j.ijar.2022.05.011

  2. Ju HR, Shan TT, Ding WP, Liu KY, Khan MJ, Huang JS, Yang XB (2024) BiFuG2-Spark: bi-directional fuzzy granular-cabin parallel attribute reduction accelerator with granular-group collaboration. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2024.3392328

    Article  Google Scholar 

  3. Jiang ZH, Liu KY, Song JJ, Yang XB, Li JH, Qian YH (2021) Accelerator for crosswise computing reduct. Appl Soft Comput 98:106740. https://doi.org/10.1016/j.asoc.2020.106740

  4. Xu WH, Yuan KH, Li WT, Ding WP (2023) An emerging fuzzy feature selection method using composite entropy-based uncertainty measure and data distribution. IEEE Trans Emerg Top Comput Intell 7(1):76–88. https://doi.org/10.1109/TETCI.2022.3171784

  5. Sang BB, Xu WH, Chen HM, Li TR (2023) Active antinoise fuzzy dominance rough feature selection using adaptive \(k\)-nearest neighbors. IEEE Trans Fuzzy Syst 31(11):3944–3958. https://doi.org/10.1109/TFUZZ.2023.3272316

    Article  Google Scholar 

  6. Dokeroglu T, Deniz A, Kiziloz HE (2022) A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 494:269–296 https://doi.org/10.1016/j.neucom.2022.04.083

  7. Chen Y, Liu, KY, Song JJ, Fujita H, Yang XB, Qian YH (2020) Attribute group for attribute reduction. Inf Sci 535:64–80. https://doi.org/10.1016/j.ins.2020.05.010

  8. Zhang JD, Liu KY, Yang XB, Ju HR, Xu SP (2023) Multi-label learning with relief-based label-specific feature selection. Appl Intell 53:18517–18530. https://doi.org/10.1007/s10489-022-04350-1

  9. Nssibi M, Manita G, Korbaa O (2023) Advances in nature-inspired metaheuristic optimization for feature selection problem: a comprehensive survey. Comput Sci Rev 49:100559. https://doi.org/10.1016/j.cosrev.2023.100559

  10. Zhou HF, Wang XQ, Zhu RR (2022) Feature selection based on mutual information with correlation coefficient. Appl Intell 52:5457–5474. https://doi.org/10.1007/s10489-021-02524-x

  11. Ang JC, Mirzal A, Haron H, Hamed HNA (2016) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinf 13(5):971–989. https://doi.org/10.1109/TCBB.2015.2478454

    Article  Google Scholar 

  12. Zhang R, Nie FP, Li XL, Wei X (2019) Feature selection with multi-view data: a survey. Inf Fusion 50:158–167. https://doi.org/10.1016/j.inffus.2018.11.019

  13. Solorio-Fernandez S, Carrasco-Ochoa JA, Martinez-Trinidad JF (2020) A review of unsupervised feature selection methods. Artif Intell Rev 53:907–948. https://doi.org/10.1007/s10462-019-09682-y

  14. Xu WH, Huang M, Jiang ZY, Qian YH (2023) Graph-based unsupervised feature selection for interval-valued information system. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2023.3263684

  15. Sheikhpour R, Sarram MA, Gharaghani S, Chahooki MAZ (2017) A survey on semi-supervised feature selection methods. Pattern Recognit 64:141–158. https://doi.org/10.1016/j.patcog.2016.11.003

  16. Yin ZY, Yang XB, Wang PX, Yu HL, Qian YH (2023) Ensemble selector mixed with pareto optimality to feature reduction. Appl Soft Comput 148:110877. https://doi.org/10.1016/j.asoc.2023.110877

  17. Guo DD, Xu WH, Qian YH, Ding WP (2023) M-fccl: memory-based concept-cognitive learning for dynamic fuzzy data classification and knowledge fusion. Inf Fusion 100:101962. https://doi.org/10.1016/j.inffus.2023.101962

  18. Xu WH, Guo DD, Qian YH, Ding WP (2023) Two-way concept-cognitive learning method: a fuzzy-based progressive learning. IEEE Trans Fuzzy Syst 31(6):1885–1899. https://doi.org/10.1109/TFUZZ.2022.3216110

  19. Xu WH, Guo DD, Mi JS, Qian YH, Zheng KY, Ding WP (2023) Two-way concept-cognitive learning via concept movement viewpoint. IEEE Trans Neural Netw Learn Syst 34(10):6798–6812. https://doi.org/10.1109/TNNLS.2023.3235800

  20. Guo DD, Xu WH, Qian YH, Ding WP (2024) Fuzzy-granular concept-cognitive learning via three-way decision: performance evaluation on dynamic knowledge discovery. IEEE Trans Fuzzy Syst 32(3):1409–1423. https://doi.org/10.1109/TFUZZ.2023.3325952

  21. Xu WH, Chen YQ (2022) Multi-attention concept-cognitive learning model: a perspective from conceptual clustering. Knowl-Based Syst 252:109472. https://doi.org/10.1016/j.knosys.2022.109472

  22. Liu KY, Li TR, Yang XB, Yang Y, Liu D (2022) Neighborhood rough set based ensemble feature selection with cross-class sample granulation. Appl Soft Comput 131:109747. https://doi.org/10.1016/j.asoc.2022.109747

  23. Ju HR, Ding WP, Yang XB, Gu PP (2023) Bi-directional adaptive neighborhood rough sets based attribute subset selection. Int J Approx Reason 160:108966. https://doi.org/10.1016/j.ijar.2023.108966

  24. Ba J, Wang PX, Yang XB, Yu HL, Yu DJ (2023) GLEE: a granularity filter for feature selection. Eng Appl Artif Intell 122:106080. https://doi.org/10.1016/j.engappai.2023.106080

  25. Ba J, Liu KY, Yang XB, Qian YH (2023) GIFT: granularity over specific-class for feature selection. Artif Intell Rev. https://doi.org/10.1007/s10462-023-10499-z

    Article  Google Scholar 

  26. An S, Guo XY, Wang CZ, Cuo G, Dai JH (2023) A soft neighborhood rough set model and its applications. Inf Sci 624:185–199. https://doi.org/10.1016/j.ins.2022.12.074

  27. Jiang ZH, Liu KY, Yang XB, Yu HL, Fujita H, Qian YH (2020) Accelerator for supervised neighborhood based attribute reduction. Int J Approx Reason 119:122–150. https://doi.org/10.1016/j.ijar.2019.12.013

  28. Ju HR, Yin T, Huang JS, Ding WP, Yang XB (2023) Sparse mutual granularity-based feature selection and its application of schizophrenia patients. IEEE Trans Emerg Top Comput Intell. https://doi.org/10.1109/TETCI.2023.3314548

  29. Liu KY, Li TR, Yang XB, Ju HR, Yang X, Liu D (2023) Feature selection in threes: neighborhood relevancy, redundancy, and granularity interactivity. Appl Soft Comput 146:110679.https://doi.org/10.1016/j.asoc.2023.110679

  30. Liu JH, Lin YJ, Du JX, Zhang HB, Chen ZY, Zhang J (2023) ASFS: a novel streaming feature selection for multi-label data based on neighborhood rough set. Appl Intell 53:1707–1724.https://doi.org/10.1007/s10489-022-03366-x

  31. Yin TY, Chen HM, Yuan Z, Li TR, Liu KY (2023) Noise-resistant multilabel fuzzy neighborhood rough sets for feature subset selection. Inf Sci 621:200–226. https://doi.org/10.1016/j.ins.2022.11.060

  32. Luo C, Cao Q, Li TR, Chen HM, Wang SZ (2023) MapReduce accelerated attribute reduction based on neighborhood entropy with Apache Spark. Expert Syst Appl 211:118554. https://doi.org/10.1016/j.eswa.2022.118554

  33. Hu M, Guo YT, Chen DG, Tsang ECC, Zhang QS (2023) Attribute reduction based on neighborhood constrained fuzzy rough sets. Knowl-Based Syst 274:110632. https://doi.org/10.1016/j.knosys.2023.110632

  34. Dai JH, Hu QH, Zhang JH, Hu H, Zheng NG (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cyber 47(9):2460–2471. https://doi.org/10.1109/TCYB.2016.2636339

  35. Chen H, Chen HM, Li WY, Li TR (2023) Semi-supervised feature selection based on pairwise constraint-guided dual space latent representation learning and double sparse graphs discriminant. Appl Intell 53:12288–12307. https://doi.org/10.1007/s10489-022-04040-y

  36. An S, Zhang MR, Wang CZ, Ding WP (2023) Robust fuzzy rough approximations with knn granules for semi-supervised feature selection. Fuzzy Sets Syst 461:108476. https://doi.org/10.1016/j.fss.2023.01.011

  37. Dai JH, Huang WY, Wang WS, Zhang CC (2023) Semi-supervised attribute reduction based on label distribution and label irrelevance. Inf Fusion 100:101951. https://doi.org/10.1016/j.inffus.2023.101951

  38. Ren JT, Qiu ZY, Fan W, Cheng H, Yu PS (2008) Forward semi-supervised feature selection. In: Pacific-Asia conference on knowledge discovery and data mining, pp 970–976. https://doi.org/10.1007/978-3-540-68125-0_101

  39. Liu KY, Li TR, Yang XB, Chen HM, Wang J, Deng ZX (2023) SemiFREE: semisupervised feature selection with fuzzy relevance and redundancy. IEEE Trans Fuzzy Syst 31(10):3384–3396

    Article  Google Scholar 

  40. Liu KY, Yang XB, Yu HL, Mi JS, Wang PX, Chen XJ (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowl-Based Syst 165:282–296. https://doi.org/10.1016/j.knosys.2018.11.034

  41. Guo ZJ, Shen Y, Yang T, Li YJ, Deng YF, Qian YH (2024) Semi-supervised feature selection based on fuzzy related family. Inf Sci 652:119660. https://doi.org/10.1016/j.ins.2023.119660

  42. Zhang H, Gong MG, Nie FP, Li XL (2022) Unified dual-label semi-supervised learning with top-\(k\) feature selection. Neurocomputing 501:875–888. https://doi.org/10.1016/j.neucom.2022.05.090

  43. Shi D, Zhu L, Li JJ, Cheng ZY, Liu ZG (2023) Binary label learning for semi-supervised feature selection. IEEE Trans Knowl Data Eng 35(3):2299–2312. https://doi.org/10.1109/TKDE.2021.3109243

  44. Hu QH, Zhang L, Zhang D, Pan W, An S, Pedrycz W (2011) Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst Appl 38:10737–10750. https://doi.org/10.1016/j.eswa.2011.01.023

  45. Hu M, Tsang ECC, Guo YT, Xu WH (2022) Fast and robust attribute reduction based on the separability in fuzzy decision systems. IEEE Trans Cyber 52(6):5559–5572. https://doi.org/10.1109/TCYB.2020.3040803

  46. Zhang PF, Li TR, Yuan Z, Luo C, Wang GQ, Liu J, Du SD (2022) A data-level fusion model for unsupervised attribute selection in multi-source homogeneous data. Inf Fusion 80:87–103. https://doi.org/10.1016/j.inffus.2021.10.017

  47. Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32:656–665. https://doi.org/10.1016/j.patrec.2010.12.014

  48. Zhao JD, Lu K, He XF (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71:1842–1849. https://doi.org/10.1016/j.neucom.2007.06.014

  49. Pang QQ, Zhang L (2020) Semi-supervised neighborhood discrimination index for feature selection. Knowl-Based Syst 204:106224. https://doi.org/10.1016/j.knosys.2020.106224

  50. Liu KY, Li TR, Yang XB, Yang X, Liu D, Zhang PF, Wang J (2022) Granular cabin: an efficient solution to neighborhood learning in big data. Inf Sci 583:189–201. https://doi.org/10.1016/j.ins.2021.11.034

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their insightful and constructive comments, which greatly improved the quality of this article. This work is supported by the National Science Foundation of China (No. 62076111) and the National College Students’ Innovation and Entrepreneurship Training Plan Program (No. 202410289027Z).

Funding

This study was funded by the National Science Foundation of China (No. 62076111) and the National College Students’ Innovation and Entrepreneurship Training Plan Program (No. 202410289027Z).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Damo Qian, Keyu Liu, Shiming Zhang; Methodology: Keyu Liu; Formal analysis and investigation: Damo Qian, Keyu Liu, Shiming Zhang; Writing - original draft preparation: Damo Qian; Writing - review and editing: Keyu Liu, Xibei Yang; Funding acquisition: Xibei Yang; Resources: Keyu Liu, Shiming Zhang; Supervision: Keyu Liu, Xibei Yang.

Corresponding author

Correspondence to Keyu Liu.

Ethics declarations

Ethical and Informed Consent for Data Used

The data used in this study were from public databases, which are openly accessible and do not contain any personally identifiable information. Therefore, the issue of informed consent does not arise. All data handling procedures adhered to ethical guidelines for research.

Competing Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qian, D., Liu, K., Zhang, S. et al. Semi-supervised feature selection by minimum neighborhood redundancy and maximum neighborhood relevancy. Appl Intell 54, 7750–7764 (2024). https://doi.org/10.1007/s10489-024-05578-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05578-9

Keywords