Granular-ball-matrix-based incremental semi-supervised feature selection approach to high-dimensional variation using neighbourhood discernibility degree for ordered partially labelled dataset

Xu, Weihua; Li, Jinlong

doi:10.1007/s10489-024-06134-1

Granular-ball-matrix-based incremental semi-supervised feature selection approach to high-dimensional variation using neighbourhood discernibility degree for ordered partially labelled dataset

Published: 04 January 2025

Volume 55, article number 268, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

48 Accesses
Explore all metrics

Abstract

In numerous real-world applications, data tends to be ordered and partially labelled, predominantly due to the constraints of labeling costs. The current methodologies for managing such data are inadequate, especially when confronted with the challenge of high-dimensional datasets, which often require reprocessing from the start, resulting in significant inefficiencies. To tackle this, we introduce an incremental semi-supervised feature selection algorithm that is grounded in neighborhood discernibility, and incorporates pseudolabel granular balls and matrix updating techniques. This novel approach evaluates the significance of features for both labelled and unlabelled data independently, using the power of neighborhood distinguishability to identify the most optimal subset of features. In a bid to enhance computational efficiency, especially with large datasets, we adopt a pseudolabel granular balls technique, which effectively segments the dataset into more manageable samples prior to feature selection. For high-dimensional data, we employ matrices to store neighborhood information, with distance functions and matrix structures that are tailored for both low and high-dimensional contexts. Furthermore, we present an innovative matrix updating method designed to accommodate fluctuations in the number of features. Our experimental results conducted across 12 datasets-including 4 with over 2000 features-demonstrate that our algorithm not only outperforms existing methods in handling large samples and high-dimensional datasets but also achieves an average time reduction of over six fold compared to similar semi-supervised algorithms. Moreover, we observe an average improvement in accuracy of 1.4%, 0.6%, and 0.2% per dataset for SVM, KNN, and Random Forest classifiers, respectively, when compared to the best-performing algorithm among the compared algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Neighbourhood discernibility degree-based semisupervised feature selection for partially labelled mixed-type data with granular ball

Article 29 June 2023

Discernible neighborhood counting based incremental feature selection for heterogeneous data

Article 13 August 2019

Semi-supervised feature selection by minimum neighborhood redundancy and maximum neighborhood relevancy

Article 13 June 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Data sources: https://jundongl.github.io/scikit-feature/

References

Hancer E, Xue B, Zhang M (2022) Fuzzy filter cost-sensitive feature selection with differential evolution. Knowl-Based Syst 241:108259
Article MATH Google Scholar
Huang P, Yang X (2022) Unsupervised feature selection via adaptive graph and dependency score. Patt Recognit 127:108622
Article Google Scholar
Sang B, Chen H, Yang L, Li T, Xu W (2021) Incremental feature selection using a conditional entropy based on fuzzy dominance neighborhood rough sets. IEEE Trans Fuzzy Syst 30:1683–1697
Article MATH Google Scholar
Yang L, Qin K, Sang B, Fu C (2022) A novel incremental attribute reduction by using quantitative dominance-based neighborhood self-information. Knowl-Based Syst 261:110200
Article MATH Google Scholar
Bai H, Li D, Ge Y, Wang J, Cao F (2022) Spatial rough set-based geographical detectors for nominal target variables. Inf Sci 586:525–539
Article MATH Google Scholar
Zhang X, Hou J (2023) A novel rough set method based on adjustable-perspective dominance relations in intuitionistic fuzzy ordered decision tables. Int J Approx Reason 154:218–241
Article MathSciNet MATH Google Scholar
Jiang H, Hu BQ (2022) On (o, g)-fuzzy rough sets based on overlap and grouping functions over complete lattices. Int J Approx Reason 144:18–50
Article MathSciNet MATH Google Scholar
Xie J, Hu BQ, Jiang H (2022) A novel method to attribute reduction based on weighted neighborhood probabilistic rough sets. Int J Approx Reason 144:1–17
Article MathSciNet MATH Google Scholar
Shu W, Yan Z, Chen T, Yu J, Qian W (2022) Information granularity-based incremental feature selection for partially labeled hybrid data. Intell Data Anal 26:33–56
Article MATH Google Scholar
Yang X, Chen H, Li T, Wan J, Sang B (2021) Neighborhood rough sets with distance metric learning for feature selection. Knowl-Based Syst 224:107076
Article MATH Google Scholar
Wu S, Wang L, Ge S, Hao Z, Liu Y (2023) Neighborhood rough set with neighborhood equivalence relation for feature selection. Knowl Inf Syst, pp 1–27
Liu K, Li T, Yang X, Yang X, Liu D, Zhang P, Wang J (2021) Granular cabin: An efficient solution to neighborhood learning in big data. Inf Sci 583:189–201
Article MATH Google Scholar
Wan J, Chen H, Yuan Z, Li T, Yang X, Sang B (2021) A novel hybrid feature selection method considering feature interaction in neighborhood rough set. Knowl-Based Syst 227:107167
Article MATH Google Scholar
Liu K, Tsang EC, Song J, Yu H, Chen X, Yang X (2020) Neighborhood attribute reduction approach to partially labeled data. Granul Comput 5:239–250
Article MATH Google Scholar
Shu W, Yu J, Chen T, Qian W (2023) Neighbourhood discernibility degree-based semisupervised feature selection for partially labelled mixed-type data with granular ball. Appl Intell 53:22467–22487
Article MATH Google Scholar
Huang D, Zhang Q, Li Z (2023) Semi-supervised attribute reduction for partially labeled categorical data based on predicted label. Int J Approx Reason 154:242–261
Article MathSciNet MATH Google Scholar
Liu K, Yang X, Yu H, Fujita H, Chen X, Liu D (2020) Supervised information granulation strategy for attribute reduction. Int J Mach Learn Cybern, pp 1–15
Gao C, Zhou J, Miao D, Yue X, Wan J (2021) Granular-conditional-entropy-based attribute reduction for partially labeled data with proxy labels. Inf Sci 580:111–128
Article MathSciNet MATH Google Scholar
Pan Y, Xu W, Ran Q (2022) An incremental approach to feature selection using the weighted dominance-based neighborhood rough sets. Int J Mach Learn Cybern 14:1217–1233
Article MATH Google Scholar
Xu W, Yang Y (2023) Matrix-based feature selection approach using conditional entropy for ordered data set with time-evolving features. Knowl-Based Syst 279:110947
Article MATH Google Scholar
Yang Y, Chen D, Zhang X, Ji Z, Zhang Y (2022) Incremental feature selection by sample selection and feature-based accelerator. Appl Soft Comput 121:108800
Article MATH Google Scholar
Cai M, Lang G, Fujita H, Li Z, Yang T (2019) Incremental approaches to updating reducts under dynamic covering granularity. Knowl-Based Syst 172:130–140
Article MATH Google Scholar
Jiang Z, Liu K, Song J, Yang X, Li J, Qian Y (2021) Accelerator for crosswise computing reduct. Appl Soft Comput 98:106740
Article MATH Google Scholar
Liu K, Li T, Yang X, Chen H, Wang J, Deng Z (2023) Semifree: Semisupervised feature selection with fuzzy relevance and redundancy. IEEE Trans Fuzzy Syst 31:3384–3396
Article MATH Google Scholar
Zhang P, Li T, Yuan Z, Luo C, Liu K, Yang X (2022) Heterogeneous feature selection based on neighborhood combination entropy. IEEE Trans Neural Netw Learn Syst, pp 1–14
Xu W, Yuan K, Li W, Ding W (2022) An emerging fuzzy feature selection method using composite entropy-based uncertainty measure and data distribution. IEEE Trans Emerg Top Comput Intell 7:76–88
Liu Y, Zheng L, Xiu Y, Yin H, Zhao S, Wang X, Chen H, Li C (2020) Discernibility matrix based incremental feature selection on fused decision tables. Int J Approx Reason 118:1–26
Article MathSciNet MATH Google Scholar
Sheng K, Wang W, Xf Bian, Dong H, MA J (2020) Neighborhood discernibility degree incremental attribute reduction algorithm for mixed data. Acta Electonica Sin 48:682
Lin R, Li J, Chen D, Huang J, Chen Y (2021) Attribute reduction in fuzzy multi-covering decision systems via observational-consistency and fuzzy discernibility. J Intell Fuzzy Syst 40:5239–5253
Article MATH Google Scholar
Li X, Tang J, Hu B, Li Y (2022) Indiscernibility and discernibility relations attribute reduction with variable precision. Sci Prog 2022:1–11
MATH Google Scholar
Xia S, Liu Y, Ding X, Wang G, Yu H, Luo Y (2019) Granular ball computing classifiers for efficient, scalable and robust learning. Inf Sci 483:136–152
Article MathSciNet MATH Google Scholar
Xia S, Peng D, Meng D, Zhang C, Wang G, Giem E, Wei W, Chen Z (2022) Ball $k$-means: Fast adaptive clustering with no bounds. IEEE Trans Patt Anal Mach Intell 44:87–99
MATH Google Scholar
Xia S, Zhang H, Li W, Wang G, Giem E, Chen Z (2020) Gbnrs: A novel rough set algorithm for fast adaptive attribute reduction in classification. IEEE Trans Knowl Data Eng 34:1231–1242
Article MATH Google Scholar
Chen Y, Wang P, Yang X, Mi J, Liu D (2021) Granular ball guided selector for attribute reduction. Knowl-Based Syst 229:107326
Article MATH Google Scholar
Zhang P, Li T, Yuan Z, Luo C, Wang G, Liu J, Du S (2022) A data-level fusion model for unsupervised attribute selection in multi-source homogeneous data. Inf Fusion 80:87–103
Article MATH Google Scholar
Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf sci 178:3577–3594
Article MathSciNet MATH Google Scholar
Yuan Z, Zhang X, Feng S (2018) Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures. Expert Syst Appl 112:243–257
Article MATH Google Scholar
Dheeru D, Casey G (2019) Uci machine learning repository http://archive.ics.uci.edu/ml. irvine, ca: University of california. School Inf Comput Sci 25:27
Hu M, Tsang EC, Guo Y, Xu W (2021) Fast and robust attribute reduction based on the separability in fuzzy decision systems. IEEE Trans Cybern 52:5559–5572
Article MATH Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NO. 62376229) and Natural Science Foundation of Chongqing (NO. CSTB2023NSCQ-LZX0027).

Author information

Authors and Affiliations

College of Artificial Intelligence, Southwest University, Chongqing, 400715, Chongqing, China
Weihua Xu & Jinlong Li

Authors

Weihua Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jinlong Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Weihua Xu: Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation. Jinlong Li: Data curation, Methodology, Formal analysis, Software, Visualization, Writing - original draft, Writing - review & editing.

Corresponding author

Correspondence to Weihua Xu.

Ethics declarations

Competing interests

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Publication ethics

We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us.

Intellectual property

We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, W., Li, J. Granular-ball-matrix-based incremental semi-supervised feature selection approach to high-dimensional variation using neighbourhood discernibility degree for ordered partially labelled dataset. Appl Intell 55, 268 (2025). https://doi.org/10.1007/s10489-024-06134-1

Download citation

Accepted: 29 November 2024
Published: 04 January 2025
DOI: https://doi.org/10.1007/s10489-024-06134-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Granular-ball-matrix-based incremental semi-supervised feature selection approach to high-dimensional variation using neighbourhood discernibility degree for ordered partially labelled dataset

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Neighbourhood discernibility degree-based semisupervised feature selection for partially labelled mixed-type data with granular ball

Discernible neighborhood counting based incremental feature selection for heterogeneous data

Semi-supervised feature selection by minimum neighborhood redundancy and maximum neighborhood relevancy

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Publication ethics

Intellectual property

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Granular-ball-matrix-based incremental semi-supervised feature selection approach to high-dimensional variation using neighbourhood discernibility degree for ordered partially labelled dataset

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Neighbourhood discernibility degree-based semisupervised feature selection for partially labelled mixed-type data with granular ball

Discernible neighborhood counting based incremental feature selection for heterogeneous data

Semi-supervised feature selection by minimum neighborhood redundancy and maximum neighborhood relevancy

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Publication ethics

Intellectual property

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation