Abstract
Feature selection is a vital technique for reducing data dimensionality. While many granular computing-based feature selection algorithms have been proposed, most have been regarded as a supervised learning task requiring a large number of labeled instances. However, obtaining sufficient labeled data is expensive and time-consuming. To address this limitation, a novel semi-supervised feature selection framework is developed by leveraging both labeled and unlabeled data. Specifically, the discernibility matrix is used to measure feature relevance on the labeled data. Moreover, mutual information is employed to evaluate the feature significance on the unlabeled data. By combining these supervised and unsupervised metrics, a greedy feature selection algorithm is proposed for the semi-supervised learning scenarios. The proposed discernibility matrix and mutual information-based feature measurement can select more discriminative features to improve the generalization performance of learning model. Finally, experiments conducted on ten UCI semi-supervised datasets demonstrate that the proposed approach achieves superior performance over five state-of-the-art semi-supervised feature selection methods.
Graphical abstract








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and access
The data that support the findings of this study are openly available in the UCI machine learning repository at http://archive.ics.uci.edu/ml.
References
Ky Mikalsen, Soguero-Ruiz C, Bianchi FM et al (2019) Noisy multi-label semi-supervised dimensionality reduction. Pattern Recognition 90:257–270
Wang F, Zhu L, Xie L et al (2021) Label propagation with structured graph learning for semi-supervised dimension reduction. Knowl-Based Syst 225:107130
Peralta D, Saeys Y (2020) Robust unsupervised dimensionality reduction based on feature clustering for single-cell imaging data. Appl Soft Comput 93:106421
Miao J, Yang T, Sun L et al (2022) Graph regularized locally linear embedding for unsupervised feature selection. Pattern Recognition 122:108299
Chen H, Chen H, Li W et al (2022) Robust dual-graph regularized and minimum redundancy based on self-representation for semi-supervised feature selection. Neurocomputing 490:104–123
Xue Y, Zhu H, Liang J et al (2021) Adaptive crossover operator based multi-objective binary genetic algorithm for feature selection in classification. Knowl-Based Syst 227:107218
Dong H, Sun J, Sun X et al (2020) A many-objective feature selection for multi-label classification. Knowl-Based Syst 208:106456
Lin Z, Luo M, Peng Z et al (2020) Nonlinear feature selection on attributed networks. Neurocomputing 410:161–173
Song Z, Yang X, Xu Z et al (2022) Graph-based semi-supervised learning: A comprehensive review. IEEE Trans Neural Netw Learn Syst
Li X, Zhao H, Yu L et al (2022) Feature extraction using parameterized multisynchrosqueezing transform. IEEE Sensors J 22(14):14263–14272
Sarkar JP, Saha I, Chakraborty S et al (2020) Machine learning integrated credibilistic semi supervised clustering for categorical data. Appl Soft Comput 86:105871
Wu F, Jing XY, Wei P et al (2022) Semi-supervised multi-view graph convolutional networks with application to webpage classification. Inf Sci 591:142–154
Sun Y, Ding S, Guo L et al (2022) Hypergraph regularized semi-supervised support vector machine. Inf Sci 591:400–421
Lv S, Shi S, Wang H et al (2021) Semi-supervised multi-label feature selection with adaptive structure learning and manifold learning. Knowl-Based Syst 214:106757
Sevilla-Salcedo C, Gomez-Verdejo V, Olmos PM (2021) Sparse semi-supervised heterogeneous interbattery bayesian analysis. Pattern Recognition 120:108141
Wang J, Liang J, Cui J et al (2021) Semi-supervised learning with mixed-order graph convolutional networks. Inf Sci 573:171–181
Fan Y, Liu J, Wu S (2022) Exploring instance correlations with local discriminant model for multi-label feature selection. Appl Intell pp 1–19
Liang N, Yang Z, Li Z et al (2021) Semi-supervised multi-view learning by using label propagation based non-negative matrix factorization. Knowl-Based Syst 228:107244
Malhotra A, Schizas ID (2020) On unsupervised simultaneous kernel learning and data clustering. Pattern Recognition 108:107518
Ren Z, Yan J, Yang X et al (2020) Unsupervised learning of optical flow with patch consistency and occlusion estimation. Pattern Recognition 103:107191
Liu K, Yang X, Yu H et al (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowl-Based Syst 165:282–296
Tang B, Zhang L (2020) Local preserving logistic i-relief for semi-supervised feature selection. Neurocomputing 399:48–64
Dai J, Liu Q (2022) Semi-supervised attribute reduction for interval data based on misclassification cost. Int J Machine Learn Cybernetics pp 1–12
Jia X, Jing XY, Zhu X et al (2020) Semi-supervised multi-view deep discriminant representation learning. IEEE Trans Pattern Anal Mach Intell 43(7):2496–2509
Zhong W, Chen X, Nie F et al (2021) Adaptive discriminant analysis for semi-supervised feature selection. Inf Sci 566:178–194
Nie F, Wang Z, Wang R et al (2021) Adaptive local embedding learning for semi-supervised dimensionality reduction. IEEE Trans Knowl Data Eng 34(10):4609–4621
Qian W, Huang J, Wang Y et al (2020) Mutual information-based label distribution feature selection for multi-label learning. Knowl-Based Syst 195:105684
Lall S, Sinha D, Ghosh A et al (2021) Stable feature selection using copula based mutual information. Pattern Recognition 112:107697
Sheikhpour R, Sarram MA, Gharaghani S et al (2020) A robust graph-based semi-supervised sparse feature selection method. Inf Sci 531:13–30
Pang QQ, Zhang L (2020) Semi-supervised neighborhood discrimination index for feature selection. Knowl-Based Syst 204:106224
Shi D, Zhu L, Li J et al (2021) Binary label learning for semi-supervised feature selection. IEEE Trans Knowl Data Eng
Liu K, Li T, Yang X et al (2023) Semifree: semi-supervised feature selection with fuzzy relevance and redundancy. IEEE Trans Fuzzy Syst
Huang Z, Li J (2022) Feature subset selection with multi-scale fuzzy granulation. IEEE Transactions on Artif Intell 4(1):121–134
Li S, Yang J, Wang G et al (2022) Granularity selection for hierarchical classification based on uncertainty measure. IEEE Trans Fuzzy Syst 30(11):4841–4855
Skowron A, Rauszer C (1992) The discernibility matrices and functions in information systems. In: Intelligent decision support: handbook of applications and advances of the rough sets theory. Springer, pp 331–362
Ma F, Ding M, Zhang T et al (2019) Compressed binary discernibility matrix based incremental attribute reduction algorithm for group dynamic data. Neurocomputing 344:20–27
Janostik R, Konecny J (2020) General framework for consistencies in decision contexts. Inf Sci 530:180–200
Liu Y, Zheng L, Xiu Y et al (2020) Discernibility matrix based incremental feature selection on fused decision tables. International Journal of Approximate Reasoning 118:1–26
Yang T, Zhong X, Lang G et al (2020) Granular matrix: A new approach for granular structure reduction and redundancy evaluation. IEEE Trans Fuzzy Syst 28(12):3133–3144
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mobile Comput Commun Rev 5(1):3–55
Sun Z, Zhang J, Dai L et al (2019) Mutual information based multi-label feature selection via constrained convex optimization. Neurocomputing 329:447–456
Qian W, Long X, Wang Y et al (2020) Multi-label feature selection based on label distribution and feature complementarity. Appl Soft Comput 90:106167
Yao E, Li D, Zhai Y et al (2021) Multilabel feature selection based on relative discernibility pair matrix. IEEE Trans Fuzzy Syst 30(7):2388–2401
Peng J, Estrada G, Pedersoli M et al (2020) Deep co-training for semi-supervised image segmentation. Pattern Recognition 107:107269
Liu N, Xu Z, Wu H et al (2021) Conversion-based aggregation algorithms for linear ordinal rankings combined with granular computing. Knowl-Based Syst 219:106880
Xiong C, Qian W, Wang Y et al (2021) Feature selection based on label distribution and fuzzy mutual information. Inf Sci 574:297–319
Sengupta D, Gupta P, Biswas A (2022) A survey on mutual information based medical image registration algorithms. Neurocomputing 486:174–188
Fang Y, Gao C, Yao Y (2020) Granularity-driven sequential three-way decisions: a cost-sensitive approach to classification. Inf Sci 507:644–664
Sun L, Yin T, Ding W et al (2020) Multilabel feature selection using ml-relieff and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci 537:401–424
Sheikhpour R, Berahmand K, Forouzandeh S (2023) Hessian-based semi-supervised feature selection using generalized uncorrelated constraint. Knowl-Based Syst 269:110521
Chang X, Ma Z, Wei X et al (2020) Transductive semi-supervised metric learning for person re-identification. Pattern Recognition 108:107569
Li H, Wang Y, Li Y et al (2021) Learning adaptive criteria weights for active semi-supervised learning. Inf Sci 561:286–303
Guo Z, Shen Y, Yang T et al (2024) Semi-supervised feature selection based on fuzzy related family. Inf Sci 652:119660
Sechidis K, Brown G (2018) Simple strategies for semi-supervised feature selection. Mach Learn 107(2):357–395
Dai J, Hu Q, Zhang J et al (2016) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybernetics 47(9):2460–2471
Song X, Zhang Y, Gong D et al (2021) Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recognition 112:107804
Lim H, Kim DW (2020) Mfc: Initialization method for multi-label feature selection based on conditional mutual information. Neurocomputing 382:40–51
Pang Q, Zhang L (2021) A recursive feature retention method for semi-supervised feature selection. Int J Mach Learn Cybernetics 12(9):2639–2657
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No.62366019 and No. 61966016), the Natural Science Foundation of Jiangxi Province, China (No.20224BAB202020), and the National Key Research and Development Program of China (No.2022YFD1600202).
Author information
Authors and Affiliations
Contributions
Wenbin Qian: Conceptualization, Methodology, Formal analysis, Writing-original draft. Lijuan Wan: Data curation, Software, Visualization, Writing-original draft. Wenhao Shu: Validation, Writing - review & editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qian, W., Wan, L. & Shu, W. Semi-supervised feature selection based on discernibility matrix and mutual information. Appl Intell 54, 7278–7295 (2024). https://doi.org/10.1007/s10489-024-05481-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05481-3