Skip to main content

Advertisement

Granular-ball-matrix-based incremental semi-supervised feature selection approach to high-dimensional variation using neighbourhood discernibility degree for ordered partially labelled dataset

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In numerous real-world applications, data tends to be ordered and partially labelled, predominantly due to the constraints of labeling costs. The current methodologies for managing such data are inadequate, especially when confronted with the challenge of high-dimensional datasets, which often require reprocessing from the start, resulting in significant inefficiencies. To tackle this, we introduce an incremental semi-supervised feature selection algorithm that is grounded in neighborhood discernibility, and incorporates pseudolabel granular balls and matrix updating techniques. This novel approach evaluates the significance of features for both labelled and unlabelled data independently, using the power of neighborhood distinguishability to identify the most optimal subset of features. In a bid to enhance computational efficiency, especially with large datasets, we adopt a pseudolabel granular balls technique, which effectively segments the dataset into more manageable samples prior to feature selection. For high-dimensional data, we employ matrices to store neighborhood information, with distance functions and matrix structures that are tailored for both low and high-dimensional contexts. Furthermore, we present an innovative matrix updating method designed to accommodate fluctuations in the number of features. Our experimental results conducted across 12 datasets-including 4 with over 2000 features-demonstrate that our algorithm not only outperforms existing methods in handling large samples and high-dimensional datasets but also achieves an average time reduction of over six fold compared to similar semi-supervised algorithms. Moreover, we observe an average improvement in accuracy of 1.4%, 0.6%, and 0.2% per dataset for SVM, KNN, and Random Forest classifiers, respectively, when compared to the best-performing algorithm among the compared algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Algorithm 2
Algorithm 3
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Data sources: https://jundongl.github.io/scikit-feature/

References

  1. Hancer E, Xue B, Zhang M (2022) Fuzzy filter cost-sensitive feature selection with differential evolution. Knowl-Based Syst 241:108259

    Article  MATH  Google Scholar 

  2. Huang P, Yang X (2022) Unsupervised feature selection via adaptive graph and dependency score. Patt Recognit 127:108622

    Article  Google Scholar 

  3. Sang B, Chen H, Yang L, Li T, Xu W (2021) Incremental feature selection using a conditional entropy based on fuzzy dominance neighborhood rough sets. IEEE Trans Fuzzy Syst 30:1683–1697

    Article  MATH  Google Scholar 

  4. Yang L, Qin K, Sang B, Fu C (2022) A novel incremental attribute reduction by using quantitative dominance-based neighborhood self-information. Knowl-Based Syst 261:110200

    Article  MATH  Google Scholar 

  5. Bai H, Li D, Ge Y, Wang J, Cao F (2022) Spatial rough set-based geographical detectors for nominal target variables. Inf Sci 586:525–539

    Article  MATH  Google Scholar 

  6. Zhang X, Hou J (2023) A novel rough set method based on adjustable-perspective dominance relations in intuitionistic fuzzy ordered decision tables. Int J Approx Reason 154:218–241

    Article  MathSciNet  MATH  Google Scholar 

  7. Jiang H, Hu BQ (2022) On (o, g)-fuzzy rough sets based on overlap and grouping functions over complete lattices. Int J Approx Reason 144:18–50

    Article  MathSciNet  MATH  Google Scholar 

  8. Xie J, Hu BQ, Jiang H (2022) A novel method to attribute reduction based on weighted neighborhood probabilistic rough sets. Int J Approx Reason 144:1–17

    Article  MathSciNet  MATH  Google Scholar 

  9. Shu W, Yan Z, Chen T, Yu J, Qian W (2022) Information granularity-based incremental feature selection for partially labeled hybrid data. Intell Data Anal 26:33–56

    Article  MATH  Google Scholar 

  10. Yang X, Chen H, Li T, Wan J, Sang B (2021) Neighborhood rough sets with distance metric learning for feature selection. Knowl-Based Syst 224:107076

    Article  MATH  Google Scholar 

  11. Wu S, Wang L, Ge S, Hao Z, Liu Y (2023) Neighborhood rough set with neighborhood equivalence relation for feature selection. Knowl Inf Syst, pp 1–27

  12. Liu K, Li T, Yang X, Yang X, Liu D, Zhang P, Wang J (2021) Granular cabin: An efficient solution to neighborhood learning in big data. Inf Sci 583:189–201

    Article  MATH  Google Scholar 

  13. Wan J, Chen H, Yuan Z, Li T, Yang X, Sang B (2021) A novel hybrid feature selection method considering feature interaction in neighborhood rough set. Knowl-Based Syst 227:107167

    Article  MATH  Google Scholar 

  14. Liu K, Tsang EC, Song J, Yu H, Chen X, Yang X (2020) Neighborhood attribute reduction approach to partially labeled data. Granul Comput 5:239–250

    Article  MATH  Google Scholar 

  15. Shu W, Yu J, Chen T, Qian W (2023) Neighbourhood discernibility degree-based semisupervised feature selection for partially labelled mixed-type data with granular ball. Appl Intell 53:22467–22487

    Article  MATH  Google Scholar 

  16. Huang D, Zhang Q, Li Z (2023) Semi-supervised attribute reduction for partially labeled categorical data based on predicted label. Int J Approx Reason 154:242–261

    Article  MathSciNet  MATH  Google Scholar 

  17. Liu K, Yang X, Yu H, Fujita H, Chen X, Liu D (2020) Supervised information granulation strategy for attribute reduction. Int J Mach Learn Cybern, pp 1–15

  18. Gao C, Zhou J, Miao D, Yue X, Wan J (2021) Granular-conditional-entropy-based attribute reduction for partially labeled data with proxy labels. Inf Sci 580:111–128

    Article  MathSciNet  MATH  Google Scholar 

  19. Pan Y, Xu W, Ran Q (2022) An incremental approach to feature selection using the weighted dominance-based neighborhood rough sets. Int J Mach Learn Cybern 14:1217–1233

    Article  MATH  Google Scholar 

  20. Xu W, Yang Y (2023) Matrix-based feature selection approach using conditional entropy for ordered data set with time-evolving features. Knowl-Based Syst 279:110947

    Article  MATH  Google Scholar 

  21. Yang Y, Chen D, Zhang X, Ji Z, Zhang Y (2022) Incremental feature selection by sample selection and feature-based accelerator. Appl Soft Comput 121:108800

    Article  MATH  Google Scholar 

  22. Cai M, Lang G, Fujita H, Li Z, Yang T (2019) Incremental approaches to updating reducts under dynamic covering granularity. Knowl-Based Syst 172:130–140

    Article  MATH  Google Scholar 

  23. Jiang Z, Liu K, Song J, Yang X, Li J, Qian Y (2021) Accelerator for crosswise computing reduct. Appl Soft Comput 98:106740

    Article  MATH  Google Scholar 

  24. Liu K, Li T, Yang X, Chen H, Wang J, Deng Z (2023) Semifree: Semisupervised feature selection with fuzzy relevance and redundancy. IEEE Trans Fuzzy Syst 31:3384–3396

    Article  MATH  Google Scholar 

  25. Zhang P, Li T, Yuan Z, Luo C, Liu K, Yang X (2022) Heterogeneous feature selection based on neighborhood combination entropy. IEEE Trans Neural Netw Learn Syst, pp 1–14

  26. Xu W, Yuan K, Li W, Ding W (2022) An emerging fuzzy feature selection method using composite entropy-based uncertainty measure and data distribution. IEEE Trans Emerg Top Comput Intell 7:76–88

  27. Liu Y, Zheng L, Xiu Y, Yin H, Zhao S, Wang X, Chen H, Li C (2020) Discernibility matrix based incremental feature selection on fused decision tables. Int J Approx Reason 118:1–26

    Article  MathSciNet  MATH  Google Scholar 

  28. Sheng K, Wang W, Xf Bian, Dong H, MA J (2020) Neighborhood discernibility degree incremental attribute reduction algorithm for mixed data. Acta Electonica Sin 48:682

  29. Lin R, Li J, Chen D, Huang J, Chen Y (2021) Attribute reduction in fuzzy multi-covering decision systems via observational-consistency and fuzzy discernibility. J Intell Fuzzy Syst 40:5239–5253

    Article  MATH  Google Scholar 

  30. Li X, Tang J, Hu B, Li Y (2022) Indiscernibility and discernibility relations attribute reduction with variable precision. Sci Prog 2022:1–11

    MATH  Google Scholar 

  31. Xia S, Liu Y, Ding X, Wang G, Yu H, Luo Y (2019) Granular ball computing classifiers for efficient, scalable and robust learning. Inf Sci 483:136–152

    Article  MathSciNet  MATH  Google Scholar 

  32. Xia S, Peng D, Meng D, Zhang C, Wang G, Giem E, Wei W, Chen Z (2022) Ball \(k\)-means: Fast adaptive clustering with no bounds. IEEE Trans Patt Anal Mach Intell 44:87–99

    MATH  Google Scholar 

  33. Xia S, Zhang H, Li W, Wang G, Giem E, Chen Z (2020) Gbnrs: A novel rough set algorithm for fast adaptive attribute reduction in classification. IEEE Trans Knowl Data Eng 34:1231–1242

    Article  MATH  Google Scholar 

  34. Chen Y, Wang P, Yang X, Mi J, Liu D (2021) Granular ball guided selector for attribute reduction. Knowl-Based Syst 229:107326

    Article  MATH  Google Scholar 

  35. Zhang P, Li T, Yuan Z, Luo C, Wang G, Liu J, Du S (2022) A data-level fusion model for unsupervised attribute selection in multi-source homogeneous data. Inf Fusion 80:87–103

    Article  MATH  Google Scholar 

  36. Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf sci 178:3577–3594

    Article  MathSciNet  MATH  Google Scholar 

  37. Yuan Z, Zhang X, Feng S (2018) Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures. Expert Syst Appl 112:243–257

    Article  MATH  Google Scholar 

  38. Dheeru D, Casey G (2019) Uci machine learning repository http://archive.ics.uci.edu/ml. irvine, ca: University of california. School Inf Comput Sci 25:27

  39. Hu M, Tsang EC, Guo Y, Xu W (2021) Fast and robust attribute reduction based on the separability in fuzzy decision systems. IEEE Trans Cybern 52:5559–5572

    Article  MATH  Google Scholar 

  40. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NO. 62376229) and Natural Science Foundation of Chongqing (NO. CSTB2023NSCQ-LZX0027).

Author information

Authors and Affiliations

Authors

Contributions

Weihua Xu: Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation. Jinlong Li: Data curation, Methodology, Formal analysis, Software, Visualization, Writing - original draft, Writing - review & editing.

Corresponding author

Correspondence to Weihua Xu.

Ethics declarations

Competing interests

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Publication ethics

We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us.

Intellectual property

We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, W., Li, J. Granular-ball-matrix-based incremental semi-supervised feature selection approach to high-dimensional variation using neighbourhood discernibility degree for ordered partially labelled dataset. Appl Intell 55, 268 (2025). https://doi.org/10.1007/s10489-024-06134-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-06134-1

Keywords