Abstract
In real-world scenarios, datasets often lack full supervision due to the high cost associated with acquiring decision labels. Completing datasets by filling in missing labels is essential for preserving the valuable feature information of individual samples. Furthermore, in the era of big data, datasets tend to exhibit high dimensionality, which adds complexity to subsequent data processing. In this study, a new semi-supervised feature selection technique is introduced. Firstly, a fully supervised dataset is created by utilizing a local density decision-labeling algorithm to fill in missing decision labels within the semi-supervised dataset. Next, a fuzzy dependency-based feature selection approach is presented to find and keep the most pertinent characteristics for the finished datasets. Finally, the effectiveness and reliability of our proposed method are validated through a series of rigorous experiments.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Li, W., Deng, C., Pedrycz, W., Castillo, O., Zhang, C., Zhan, T.: Double-quantitative feature selection approach for multi-granularity ordered decision systems. IEEE Trans. Artif. Intell. 1–12 (2023)
Li, Y., Wei, S., Liu, X., Zhang, Z.: A novel robust fuzzy rough set model for feature selection. Complexity 2021, 6685396 (2021)
Sun, L., Yin, T., Ding, W., Qian, Y., Xu, J.: Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy. IEEE Trans. Fuzzy Syst. 30(5), 1197–1211 (2021)
Zhang, H.: Feature selection using approximate conditional entropy based on fuzzy information granule for gene expression data classification. Front. Genet. 12, 631505 (2021)
Wang, Z., Zheng, X., Pan, H., Li, D.: Information entropy multi-decision attribute reduction fuzzy rough set for dust particulate imagery characteristic extraction. IEEE Access 8, 77865–77874 (2020)
Xia, S., Bai, X., Wang, G., Cheng, Y., Meng, D., Gao, X., Zhai, Y., Giem, E.: An efficient and accurate rough set for feature selection, classification, and knowledge representation. IEEE Trans. Knowl. Data Eng. 35(8), 7724–7735 (2023)
Yan, X., Sarkar, M., Gebru, B., Nazmi, S., Homaifar, A.: A supervised feature selection method for mixed-type data using density-based feature clustering. In: 2021 IEEE International conference on systems, man, and cybernetics (SMC), pp. 1900–1905. IEEE (2021)
Zhong, W., Chen, X., Nie, F., Huang, J.Z.: Adaptive discriminant analysis for semi-supervised feature selection. Inf. Sci. 566, 178–194 (2021)
Liu, K., Yang, X., Yu, H., Mi, J., Wang, P., Chen, X.: Rough set based semi-supervised feature selection via ensemble selector. Knowl.-Based Syst. 165, 282–296 (2019)
Shu, W., Yan, Z., Yu, J., Qian, W.: Information gain-based semi-supervised feature selection for hybrid data. Appl. Intell. 53(6), 7310–7325 (2023)
Li, Z., Tang, J.: Semi-supervised local feature selection for data classification. Sci. China Inf. Sci. 64(9), 192108 (2021)
Du, W., Phlypo, R., Adalı, T.: Adaptive feature selection and feature fusion for semi-supervised classification. J. Signal Process. Syst. 91(5), 521–537 (2019)
Coelho, F., Castro, C., Braga, A.P., Verleysen, M.: Semi-supervised relevance index for feature selection. Neural Comput. Appl. 31, 989–997 (2019)
Khozaei, B., Eftekhari, M.: Unsupervised feature selection based on spectral clustering with maximum relevancy and minimum redundancy approach. Int. J. Pattern Recogn. Artif. Intell. 35(11), 2150031 (2021)
Hamaide, V., Glineur, F.: Unsupervised minimum redundancy maximum relevance feature selection for predictive maintenance: application to a rotating machine. Int. J. Prognost. Health Manag. 12(2) (2021)
Cheng, Q., et al.: Algorithmic stability and generalization of an unsupervised feature selection algorithm. Adv. Neural. Inf. Process. Syst. 34, 19860–19875 (2021)
Zhou, J., Liu, D.: A redundancy based unsupervised feature selection method for high-dimensional data. In: 2021 13th International Conference on Machine Learning and Computing, pp. 285–289 (2021)
Zhang, P., Li, T., Yuan, Z., Deng, Z., Wang, G., Wang, D., Zhang, F.: A possibilistic information fusion-based unsupervised feature selection method using information quality measures. IEEE Trans. Fuzzy Syst. 31(9), 2975–988 (2023)
Zhang, P., Li, T., Yuan, Z., Luo, C., Wang, G., Liu, J., Du, S.: A data-level fusion model for unsupervised attribute selection in multi-source homogeneous data. Inf. Fusion 80, 87–103 (2022)
Zhang, P., Wang, D., Yu, Z., Zhang, Y., Jiang, T., Li, T.: A multi-scale information fusion-based multiple correlations for unsupervised attribute selection. Inf. Fusion 106, 102276 (2024)
Li, W., Zhai, S., Xu, W., Pedrycz, W., Qian, Y., Ding, W., Zhan, T.: Feature selection approach based on improved fuzzy c-means with principle of refined justifiable granularity. IEEE Trans. Fuzzy Syst. 31(7), 2112–2126 (2022)
Zeng, Z., Wang, X., Yan, F., Chen, Y.: Local adaptive learning for semi-supervised feature selection with group sparsity. Knowl.-Based Syst. 181, 104787 (2019)
Shi, C., Gu, Z., Duan, C., Tian, Q.: Multi-view adaptive semi-supervised feature selection with the self-paced learning. Signal Process. 168, 107332 (2020)
Feng, W., Ji-Chao, L., Wei, W.: Semi-supervised feature selection algorithm based on information entropy. Comput. Sci. 45(11), 427–30 (2018)
Dai, J., Liu, Q.: Semi-supervised attribute reduction for interval data based on misclassification cost. Int. J. Mach. Learn. Cybern. 13, 1739–1750 (2022)
An, S., Zhang, M., Wang, C., Ding, W.: Robust fuzzy rough approximations with knn granules for semi-supervised feature selection. Fuzzy Sets Syst. 461, 108476 (2023)
Campagner, A., Ciucci, D., Hüllermeier, E.: Rough set-based feature selection for weakly labeled data. Int. J. Approx. Reason. 136, 150–167 (2021)
Campagner, A., Ciucci, D., Denœux, T.: Belief functions and rough sets: survey and new insights. Int. J. Approx. Reason. 143, 192–215 (2022)
Campagner, A., Ciucci, D.: Rough-set based genetic algorithms for weakly supervised feature selection. In: International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 761–773 (2022). Springer
Li, W., Zhan, T.: Multi-granularity probabilistic rough fuzzy sets for interval-valued fuzzy decision systems. Int. J. Fuzzy Syst. 25(8), 3061–3073 (2023)
Li, W., Zhou, H., Xu, W., Wang, X.-Z., Pedrycz, W.: Interval dominance-based feature selection for interval-valued ordered data. IEEE Trans. Neural Netw. Learn. Syst. 34(10), 6898–6912 (2023)
Zeng, A., Li, T., Liu, D., Zhang, J., Chen, H.: A fuzzy rough set approach for incremental feature selection on hybrid information systems. Fuzzy Sets Syst. 258, 39–60 (2015)
Hu, M., Tsang, E.C., Guo, Y., Chen, D., Xu, W.: A novel approach to attribute reduction based on weighted neighborhood rough sets. Knowl.-Based Syst. 220, 106908 (2021)
An, S., Hu, Q., Wang, C.: Probability granular distance-based fuzzy rough set model. Appl. Soft Comput. 102, 107064 (2021)
Yang, X., Chen, H., Li, T., Luo, C.: A noise-aware fuzzy rough set approach for feature selection. Knowl.-Based Syst. 250, 109092 (2022)
Li, W., Xu, W., Zhang, X., Zhang, J.: Updating approximations with dynamic objects based on local multigranulation rough sets in ordered information systems. Artif. Intell. Rev. 55(3), 1821–1855 (2022)
Li, W., Wei, Y., Xu, W.: General expression of knowledge granularity based on a fuzzy relation matrix. Fuzzy Sets Syst. 440, 149–163 (2022)
Guo, Z., Shen, Y., Yang, T., Li, Y., Deng, Y., Qian, Y.: Semi-supervised feature selection based on fuzzy related family. Inf. Sci. 652, 119660 (2024)
Gu, X., Angelov, P.P., Shen, Q.: Semi-supervised fuzzily weighted adaptive boosting for classification. IEEE Trans. Fuzzy Syst. 32(4), 2318–2330 (2024)
Asuncion, A., Newman, D.: UCI machine learning repository. Irvine (2007)
Alcalá-Fdez, J., Sanchez, L., Garcia, S., Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., et al.: Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft. Comput. 13, 307–318 (2009)
Pan, Y., Xia, K., Wang, L., He, Z.: A novel approach to oil layer recognition model using whale optimization algorithm and semi-supervised svm. Symmetry 13(5), 757 (2021)
Wan, J., Chen, H., Li, T., Yang, X., Sang, B.: Dynamic interaction feature selection based on fuzzy rough set. Inf. Sci. 581, 891–911 (2021)
Adeniyi, D.A., Wei, Z., Yongquan, Y.: Automated web usage data mining and recommendation system using k-nearest neighbor (knn) classification method. Appl. Comput. Inform. 12(1), 90–108 (2016)
Xu, J., Wang, Y., Xu, K., Zhang, T., et al.: Feature genes selection using fuzzy rough uncertainty metric for tumor diagnosis. Comput. Math. Methods Med. 2019, 6705648 (2019)
Acknowledgements
We would like to thank the Editor-in-Chief, editors, and anonymous reviewers for their insightful and constructive comments, which have greatly aided us in improving the quality of the paper. This work was supported by the National Natural Science Foundation of China (Grant nos. 12261010, 12326353), the Natural Science Foundation of Guangxi (2023GXNSFBA026019), the Key Laboratory of Software Engineering in Guangxi MinZu University (2022-18XJSY-03), the Postdoctoral Fellowship Program of CPSF (no. GZB20230092), the China Postdoctoral Science Foundation (no. 2023M740383), the Natural Science Foundation of Sichuan Province (no. 24NSFSC1654).
Author information
Authors and Affiliations
Contributions
GZ: Conceptualization, Methodology, Software, Data curation, Writing-original draft. JH: Software, Visualization. PZ: Methodology, Writing-review and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, G., Hu, J. & Zhang, P. Leveraging Local Density Decision Labeling and Fuzzy Dependency for Semi-supervised Feature Selection. Int. J. Fuzzy Syst. 26, 2805–2820 (2024). https://doi.org/10.1007/s40815-024-01740-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40815-024-01740-0