Skip to main content
Log in

Uncertainty measurement of partially labeled categorical data with application to semi-supervised attribute reduction

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

In many practical applications of machine learning, there are a large number of partially labeled categorical data due to the high cost of labelling data. Semi-supervised learning algorithm is needed to deal with such data. This paper studies uncertainty measurement (UM) of partially labeled categorical data and considers semi-supervised attribute reduction in a partially labeled categorical decision information system (p-CDIS). The fact that a discernibility pair set for categorical data is actually a distinguishable relation is first stated. Then, a p-CDIS is divided into two categorical decision information systems: one is the labeled categorical decision information system (l-CDIS) and the other is the unlabeled categorical decision information system (u-CDIS). Next, based on the indistinguishable relation, distinguishable relation and dependence function, four degrees of importance are defined. They are the weighted sum of l-CDIS and u-CDIS determined by the label missing rate and can be considered as the UM of p-CDIS. Moreover, the numerical experiments and statistical tests on 10 datasets verify their effectiveness. In addition, an adaptive semi-supervised reduction algorithm based on the defined degrees of importance is proposed, which can automatically adapt to various label missing rates. Finally, the results of experiments and statistical tests on 10 datasets show the proposed algorithm is statistically better than some stat-of-the-art algorithms according to classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

  • Bianucci D, Cattaneo G, Ciucci D (2007) Entropies and cöcentropies of coverings with application to incomplete information systems. Fund Inf 75(1–4):77–105

    MATH  Google Scholar 

  • Cament LA, Castillo LE, Perez JP, Galdames FJ, Perez CA (2014) Fusion of local normalization and Gabor entropy weighted features for face identification. Pattern Recogn 47:568–577

    Article  Google Scholar 

  • Campagner A, Ciucci D (2017) Measuring uncertainty in orthopairs. In: European conference on symbolic and quantitative approaches to reasoning and uncertainty, pp 423–432

  • Campagner A, Ciucci D, Denoeux T (2022) Belief functions and rough sets: survey and new insights. Int J Approx Reason 143:192–215

    Article  MathSciNet  MATH  Google Scholar 

  • Dai JH, Tian HW (2013) Entropy measures and granularity measures for setvalued information systems. Inf Sci 240:72–82

    Article  MATH  Google Scholar 

  • Dai JH, Xu Q, Wang WT, Tian HW (2012) Conditional entropy for incomplete decision systems and its application in data mining. Int J Gen Syst 41:713–728

    Article  MathSciNet  MATH  Google Scholar 

  • Dai JH, Wang WT, Xu Q (2013) An uncertainty measure for incomplete decision tables and its applications. IEEE Trans Cybern 43:1277–1289

    Article  Google Scholar 

  • Dai JH, Hu H, Zheng GJ, Hu QH, Han HF, Shi H (2016) Attribute reduction in interval-valued information systems based on information entropies. Front Inf Technol Electron Eng 17:919–928

    Article  Google Scholar 

  • Dai JH, Hu QH, Zhang JH, Hu H, Zheng NG (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybern 47:2460–2471

    Article  Google Scholar 

  • Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  Google Scholar 

  • Han YH, Yang Y, Yan Y, Ma ZG, Zhou XF (2015) Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Netw Learn Syst 26:252–264

    Article  MathSciNet  Google Scholar 

  • Hempelmann CF, Sakoglu U, Gurupur VP, Jampana S (2016) An entropy-based evaluation method for knowledge bases of medical information systems. Expert Syst Appl 46:262–273

    Article  Google Scholar 

  • Hu QH, Yu DR, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178:3577–3594

    Article  MathSciNet  MATH  Google Scholar 

  • Hu M, Tsang ECC, Guo YT, Xu WH (2020) Fast and robust attribute reduction based on the separability in fuzzy decision systems. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2020.3040803

    Article  Google Scholar 

  • Hu SD, Miao DQ, Yao YY (2021) Three-way label propagation based semi-supervised attribute reduction. Chin J Comput 44:2332–2343

    Google Scholar 

  • Li ZW, Zhang PF, Ge X, Xie NX, Zhang GQ (2019) Uncertainty measurement for a covering information system. Soft Comput 23:5307–5325

    Article  MATH  Google Scholar 

  • Liang JY, Qian YH (2008) Information granules and entropy theory in information systems. Sci China F 51:1427–1444

    MathSciNet  MATH  Google Scholar 

  • Navarrete J, Viejo D, Cazorla M (2016) Color smoothing for RGB-D data using entropy information. Appl Soft Comput 46:361–380

    Article  Google Scholar 

  • Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356

    Article  MATH  Google Scholar 

  • Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Dordrecht

    Book  MATH  Google Scholar 

  • Qian YH, Liang JY, Wu WZ, Dang CY (2011) Information granularity in fuzzy binary GrC model. IEEE Trans Fuzzy Syst 19:253–264

    Article  Google Scholar 

  • Singh S, Shreevastava S, Som T, Somani G (2020) A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems. Soft Comput 24:4675–4691

    Article  MATH  Google Scholar 

  • Swiniarski RW, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recogn Lett 24:833–849

    Article  MATH  Google Scholar 

  • Vluymans S (2019) Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods. Springer, Cham

    Book  Google Scholar 

  • Wang F, Liu JC, Wei W (2018) Semi-supervised feature selection algorithm based on information entropy. Comput Sci 45:427–430

    Google Scholar 

  • Wang P, Zhang PF, Li ZW (2019) A three-way decision method based on Gaussian kernel in a hybrid information system with images: an application in medical diagnosis. Appl Soft Comput 77:734–749

    Article  Google Scholar 

  • Wang YB, Chen XJ, Dong K (2019) Attribute reduction via local conditional entropy. Int J Mach Learn Cybern 10:3619–3634

    Article  Google Scholar 

  • Wang CZ, Huang Y, Shao MW, Hu QH, Chen DG (2020) Feature selection based on neighborhood self-information. IEEE Trans Cybern 50:4031–4042

    Article  Google Scholar 

  • Wierman MJ (1999) Measuring uncertainty in rough set theory. Int J Gen Syst 28:283–297

    Article  MathSciNet  MATH  Google Scholar 

  • Xu ZL, King I, Michael RTL, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Netw 21:1033–1047

    Article  Google Scholar 

  • Zhang W, Miao DQ, Gao C, Li F (2016) Semi-supervised attribute reduction based on rough-subspace ensemble learning. J Chin Comput Syst 37:2727–2732

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper. This work is supported by National Natural Science Foundation of China (12261096), Natural Science Foundation of Guangxi (2020GXNSFAA159155) and Natural Science Foundation of Yulin (202125001).

Author information

Authors and Affiliations

Authors

Contributions

PW: Methodology, Writing Original draft, Investigation. QZ: Methodology, Software, Investigation. WP: Reviewing, Editing. ZL: Software, Validation, Investigation. C-FW: Reviewing, Editing.

Corresponding authors

Correspondence to Qinli Zhang or Ching-Feng Wen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, P., Zhang, Q., Pedrycz, W. et al. Uncertainty measurement of partially labeled categorical data with application to semi-supervised attribute reduction. Artif Intell Rev 56, 14731–14764 (2023). https://doi.org/10.1007/s10462-023-10518-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10518-z

Keywords

Navigation