Feature selection using self-information uncertainty measures in neighborhood information systems

Xu, Jiucheng; Qu, Kanglin; Sun, Yuanhao; Yang, Jie

doi:10.1007/s10489-022-03760-5

Feature selection using self-information uncertainty measures in neighborhood information systems

Published: 11 June 2022

Volume 53, pages 4524–4540, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jiucheng Xu^1,2,
Kanglin Qu ORCID: orcid.org/0000-0002-5062-5012^1,2,
Yuanhao Sun^1,2 &
…
Jie Yang^1,2

543 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

The neighborhood rough set model (NRS) has been widely applied to study feature selection. Nevertheless, the dependency, as a significant feature evaluation function in NRS, only focuses on the classification information in the lower approximation and ignores the classification information in the upper approximation, which affects the evaluation effect of this function. Consequently, this paper first defines the fuzziness using the upper approximation and proposes two self-information uncertainty measures based on the dependency and fuzziness. Second, combining the above two self-information uncertainty measures, a more comprehensive approximate self-information is proposed for evaluating the uncertainty of the classification information of feature subsets. Furthermore, a heuristic feature selection algorithm is constructed based on the approximate self-information. Third, to reduce the time cost of the constructed algorithm in processing high-dimensional datasets, we propose a two-stage selection strategy, in which the first stage adopts the Fisher score dimensionality reduction method (FS) with low time cost and stable performance to retain important features in the high-dimensional dataset as a candidate feature subset. Then, the second stage employs our algorithm to further reduce the candidate feature subset. Finally, the results of various feature selection algorithms on eleven datasets are presented, and the comparison results confirm that our algorithm is efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

A review of unsupervised feature selection methods

Article 29 January 2019

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

Article Open access 06 November 2019

References

Lin EL, Chen Q, Qi XM (2020) Deep reinforcement learning for imbalanced classification. Appl Intell 50(8):2488–2502
Article Google Scholar
Bai SX, Lin YJ, Lv Y, Chen JK, Wang CX (2021) Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification. Appl Intell 51(3):1602–1615
Article Google Scholar
Sharmin S, Shoyaib M, Ali AA (2019) Simultaneous feature selection and discretization based on mutual information. Pattern Recognit 91:162–174
Article Google Scholar
Bugata P, Drotar P (2020) On some aspects of minimum redundancy maximum relevance feature selection. Sci China Inf Sci. https://doi.org/10.1007/s11432-019-2633-y
Gao WF, Hu L, Zhang P, He JL (2018) Feature selection considering the composition of feature relevancy. Pattern Recognit Letters 112:70–74
Article Google Scholar
Wei GF, Zhao J, Feng YL, He AX, Yu J (2020) A novel hybrid feature selection method based on dynamic feature importance. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106337
Zhang P, Gao WF, Liu GX (2018) Feature selection considering weighted relevancy. Appl Intell 48(12):4615–4625
Article Google Scholar
Xu JC, Qu KL, Yang Y (2021) Feature Selection Combining Information Theory View and Algebraic View in the Neighborhood Decision System. Entropy. https://doi.org/10.3390/e23060704
Liu KY, Yang XB, Fujita H, Liu D, Yang X, Qian YH (2019) An efficient selector for multi-granularity attribute reduction. Inf Sci 505:457–472
Article Google Scholar
Li JD, Cheng KW, Wang SH, Morstatter F, Trevino RP, Tang JL, Liu H (2018) Feature Selection: A Data Perspective. ACM Comput Surv. https://doi.org/10.1145/3136625
Shahee SA, Ananthakumar U (2020) An effective distance based feature selection approach for imbalanced data. Appl Intell 50(3):717–745
Article Google Scholar
Pawlak Z, Skowron A (2007) Rough sets: Some extensions. Inf Sci 177(1):28–40
Article MATH Google Scholar
Yang X, Li TR, Liu D, Fujita H (2020) A multilevel neighborhood sequential decision approach of three-way granular computing. Inf Sci 538:119–141
Article MATH Google Scholar
Sheeja TK, Kuriakose A S (2018) A novel feature selection method using fuzzy rough sets. Comput Ind 97:111–116
Article Google Scholar
Ni P, Zhao SY, Wang XZ, Chen H, Li CP (2020) Incremental feature selection based on fuzzy rough sets. Inf Sci 539:185–204
Article MATH Google Scholar
Cai YL, Zhang HG, He Q, Duan J (2020) A novel framework of fuzzy oblique decision tree construction for pattern classification. Appl Intell 50(9):2959–2975
Article Google Scholar
Miao DQ (2001) Discretization of continuous attributes in rough set theory. Acta Autom Sin 27:296–302
Google Scholar
Yue XD, Chen YF, Miao DQ, Fujita H (2020) Fuzzy neighborhood covering for three-way classification. Inf Sci 507:795–808
Article MATH Google Scholar
Hu Q H, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Article MATH Google Scholar
Liu KY, Li TR, Yang XB, Yang X, Liu D, Zhang PF, Wang J (2021) Granular cabin: An efficient solution to neighborhood learning in big data. Inf Sci 583:189–201
Article Google Scholar
Zheng KF, Wang XJ, Wu B, Wu T (2020) Feature subset selection combining maximal information entropy and maximal information coefficient. Applied Intelligebce 50(2):487–501
Article Google Scholar
Chen YY, Chen YM (2021) Feature Subset Selection Based on Variable Precision Neighborhood Rough Sets. Int J Comput Intell Syst 14(1):572–581
Article Google Scholar
Zhang X, Mei CL, Chen DG, Liu JH (2016) Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy. Pattern Recognit 56:1–15
Article MATH Google Scholar
Wang CZ, Huang Y, Shao MW, Hu QH, Chen DG (2020) Feature Selection Based on Neighborhood Self-Information. IEEE Trans Cybern 50(9):4031–4042
Article Google Scholar
Lin YJ, Hu QH, Liu JH, Duan J (2015) Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168:92–103
Article Google Scholar
Sun L, Wang LY, Ding WP (2021) Feature Selection Using Fuzzy Neighborhood Entropy-Based Uncertainty Measures for Fuzzy Neighborhood Multigranulation Rough Sets. IEEE Trans Fuzzy Syst 29(1):19–33
Article Google Scholar
Al-An A (2009) A dependency-based search strategy for feature selection. Expert Syst Appl 36 (10):12392–12398
Article Google Scholar
Wang CZ, Huang Y, Ding WP, Cao ZH (2021) Attribute reduction with fuzzy rough self-information measures. Inf Sci 549:68–86
Article MATH Google Scholar
Jiang ZH, Liu KY, Yang XB, Yu HL, Fujita H, Qian YH (2020) Accelerator for supervised neighborhood based attribute reduction. Int J Approx Reason 119:122–150
Article MATH Google Scholar
Hu CX, Zhang L, Wang BJ, Zhang Z, Li FZ (2019) Incremental updating knowledge in neighborhood multigranulation rough sets under dynamic granular structures. Knowl Based Syst 163:811–829
Article Google Scholar
Wang GY (2003) Rough Reduction in Algebra View and Information View. International Journal of Intelligent Systems 18:679–688
Article MATH Google Scholar
Wang CZ, Huang Y, Shao MW, Fan XD (2019) Fuzzy rough set-based attribute reduction using distance measures. Knowl Based Syst 164:205–212
Article Google Scholar
Jiang ZH, Liu KY, Song JJ, Yang XB, Li JH, Qian YH (2021) Accelerator for crosswise computing reduct. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106740
Fan J, Jiang YL, Liu Y (2017) Quick attribute reduction with generalized indiscernibility models. Inf Sci 397:15–36
Google Scholar
Cai MJ, Lang GM, Fujita H, Li ZY, Yang T (2019) Incremental approaches to updating reducts under dynamic covering granularity. Knowl Based Syst 172:130–140
Article Google Scholar
Saqlain SM, Sher M, Shah FA, Khan I (2019) Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines. Knowl Inf Syst 58 (1):139–167
Article Google Scholar
Yilmaz E (2013) An Expert System Based on Fisher Score and LS-SVM for Cardiac Arrhythmia Diagnosis. Comput Math Methods Med. https://doi.org/10.1155/2013/849674
Sun L, Zhang XY, Qian YH, Xu JC (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41
Article MATH Google Scholar
Shannon CE (1997) The mathematical theory of communication. MD Comput: Computers in Medical Practice 14(4):306–317
Google Scholar
Sun L, Wang LY, Ding WP, Qian YH, Xu JC (2020) Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.105373
Chen DG, Zhang L, Zhao SY, Hu QH, Zhu PF (2012) A novel algorithm for finding reducts with fuzzy rough sets. IEEE Trans Fuzzy Syst 20(2):385–389
Article Google Scholar
Qian YH, Wang Q, Cheng HH, Liang JY, Dang CY (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258(1):61–78
Article MATH Google Scholar
Jensen R, Shen Q (2009) New approaches to fuzzy-rough feature selection. IEEE Trans Fuzzy Syst 17(4):824–838
Article Google Scholar
Tan AH, Wu WZ, Qian YH, Liang JY, Chen JK, Li JJ (2019) Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans Fuzzy Syst 27(3):527–539
Article Google Scholar
Chen YM, Zhang ZJ, Zheng JZ, Ma Y, Xue Y (2017) Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 67:59–68
Article Google Scholar
Xu FF, Miao DQ, Wei L (2009) Fuzzy-rough attribute reduction via mutual information with an application to cancer classification. Comput Math Appl 57(6):1010–1017
Article MATH Google Scholar
Fan XD, Zhao WD, Wang CZ, Huang Y (2018) Attribute reduction based on max-decision neighborhood rough set model. Knowl Based Syst 151:16–23
Article Google Scholar
Zhang W, Chen JJ (2018) Relief feature selection and parameter optimization for support vector machine based on mixed kernel function. Int J Perform Eng 14(2):280–289
Google Scholar
Lu HJ, Chen JY, Yan K, Jin Q, Xue Y, Gao ZG (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62
Article Google Scholar
Li JT, Dong WP, Meng DY (2018) Grouped gene selection of cancer via adaptive sparse group lasso based on conditional mutual information. IEEE/ACM Trans Comput Biol Bioinform 15(6):2028–2038
Article Google Scholar
Apolloni J, Leguizamon G, Alba E (2016) Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Computing 38:922–932
Article Google Scholar
Friedman M (1940) A comparison of alternative tests of significance for the problem of mrankings. Ann Math Stat 11:86–92
Article MATH Google Scholar
Dunn OJ (1961) Multiple comparisons among means. Publications of the American Statistical Association 56:52–64
Article MATH Google Scholar
Fujita H, Gaeta A, Loia V, Orciuoli F (2020) Hypotheses Analysis and Assessment in Counterterrorism Activities: A Method Based on OWA and Fuzzy Probabilistic Rough Sets. IEEE Trans Fuzzy Syst 28:831–845
Article Google Scholar
Xu JC, Qu KL, Meng XR, Sun YH, Hou QC (2022) Feature selection based on multiview entropy measures in multiperspective rough set. Int J Intell Syst. https://doi.org/10.1002/int.22878
Fujita H, Gaeta A, Loia V, Orciuoli F (2019) Resilience Analysis of Critical Infrastructures: A Cognitive Approach Based on Granular Computing. IEEE Trans Cybern 49:1835–1848
Article Google Scholar

Download references

Acknowledgments

The paper is supported in part by the National Natural Science Foundation of China under Grant (61976082, 62002103).

Author information

Authors and Affiliations

College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China
Jiucheng Xu, Kanglin Qu, Yuanhao Sun & Jie Yang
Engineering Lab of Intelligence Business & Internet of Things, Henan Province, Xinxiang, 453007, China
Jiucheng Xu, Kanglin Qu, Yuanhao Sun & Jie Yang

Authors

Jiucheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Kanglin Qu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanhao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jie Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Jiucheng Xu; Methodology: Kanglin Qu; Writing - original draft preparation: Kanglin Qu, Yuanhao Sun, Jie Yang; Writing - review and editing: Yuanhao Sun, Jie Yang; Funding acquisition: Jiucheng Xu.

Corresponding authors

Correspondence to Kanglin Qu or Yuanhao Sun.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J., Qu, K., Sun, Y. et al. Feature selection using self-information uncertainty measures in neighborhood information systems. Appl Intell 53, 4524–4540 (2023). https://doi.org/10.1007/s10489-022-03760-5

Download citation

Accepted: 10 May 2022
Published: 11 June 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10489-022-03760-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection using self-information uncertainty measures in neighborhood information systems

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

References

Acknowledgments

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature selection using self-information uncertainty measures in neighborhood information systems

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

References

Acknowledgments

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation