Abstract
The neighborhood rough set model (NRS) has been widely applied to study feature selection. Nevertheless, the dependency, as a significant feature evaluation function in NRS, only focuses on the classification information in the lower approximation and ignores the classification information in the upper approximation, which affects the evaluation effect of this function. Consequently, this paper first defines the fuzziness using the upper approximation and proposes two self-information uncertainty measures based on the dependency and fuzziness. Second, combining the above two self-information uncertainty measures, a more comprehensive approximate self-information is proposed for evaluating the uncertainty of the classification information of feature subsets. Furthermore, a heuristic feature selection algorithm is constructed based on the approximate self-information. Third, to reduce the time cost of the constructed algorithm in processing high-dimensional datasets, we propose a two-stage selection strategy, in which the first stage adopts the Fisher score dimensionality reduction method (FS) with low time cost and stable performance to retain important features in the high-dimensional dataset as a candidate feature subset. Then, the second stage employs our algorithm to further reduce the candidate feature subset. Finally, the results of various feature selection algorithms on eleven datasets are presented, and the comparison results confirm that our algorithm is efficient.
Similar content being viewed by others
References
Lin EL, Chen Q, Qi XM (2020) Deep reinforcement learning for imbalanced classification. Appl Intell 50(8):2488–2502
Bai SX, Lin YJ, Lv Y, Chen JK, Wang CX (2021) Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification. Appl Intell 51(3):1602–1615
Sharmin S, Shoyaib M, Ali AA (2019) Simultaneous feature selection and discretization based on mutual information. Pattern Recognit 91:162–174
Bugata P, Drotar P (2020) On some aspects of minimum redundancy maximum relevance feature selection. Sci China Inf Sci. https://doi.org/10.1007/s11432-019-2633-y
Gao WF, Hu L, Zhang P, He JL (2018) Feature selection considering the composition of feature relevancy. Pattern Recognit Letters 112:70–74
Wei GF, Zhao J, Feng YL, He AX, Yu J (2020) A novel hybrid feature selection method based on dynamic feature importance. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106337
Zhang P, Gao WF, Liu GX (2018) Feature selection considering weighted relevancy. Appl Intell 48(12):4615–4625
Xu JC, Qu KL, Yang Y (2021) Feature Selection Combining Information Theory View and Algebraic View in the Neighborhood Decision System. Entropy. https://doi.org/10.3390/e23060704
Liu KY, Yang XB, Fujita H, Liu D, Yang X, Qian YH (2019) An efficient selector for multi-granularity attribute reduction. Inf Sci 505:457–472
Li JD, Cheng KW, Wang SH, Morstatter F, Trevino RP, Tang JL, Liu H (2018) Feature Selection: A Data Perspective. ACM Comput Surv. https://doi.org/10.1145/3136625
Shahee SA, Ananthakumar U (2020) An effective distance based feature selection approach for imbalanced data. Appl Intell 50(3):717–745
Pawlak Z, Skowron A (2007) Rough sets: Some extensions. Inf Sci 177(1):28–40
Yang X, Li TR, Liu D, Fujita H (2020) A multilevel neighborhood sequential decision approach of three-way granular computing. Inf Sci 538:119–141
Sheeja TK, Kuriakose A S (2018) A novel feature selection method using fuzzy rough sets. Comput Ind 97:111–116
Ni P, Zhao SY, Wang XZ, Chen H, Li CP (2020) Incremental feature selection based on fuzzy rough sets. Inf Sci 539:185–204
Cai YL, Zhang HG, He Q, Duan J (2020) A novel framework of fuzzy oblique decision tree construction for pattern classification. Appl Intell 50(9):2959–2975
Miao DQ (2001) Discretization of continuous attributes in rough set theory. Acta Autom Sin 27:296–302
Yue XD, Chen YF, Miao DQ, Fujita H (2020) Fuzzy neighborhood covering for three-way classification. Inf Sci 507:795–808
Hu Q H, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Liu KY, Li TR, Yang XB, Yang X, Liu D, Zhang PF, Wang J (2021) Granular cabin: An efficient solution to neighborhood learning in big data. Inf Sci 583:189–201
Zheng KF, Wang XJ, Wu B, Wu T (2020) Feature subset selection combining maximal information entropy and maximal information coefficient. Applied Intelligebce 50(2):487–501
Chen YY, Chen YM (2021) Feature Subset Selection Based on Variable Precision Neighborhood Rough Sets. Int J Comput Intell Syst 14(1):572–581
Zhang X, Mei CL, Chen DG, Liu JH (2016) Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy. Pattern Recognit 56:1–15
Wang CZ, Huang Y, Shao MW, Hu QH, Chen DG (2020) Feature Selection Based on Neighborhood Self-Information. IEEE Trans Cybern 50(9):4031–4042
Lin YJ, Hu QH, Liu JH, Duan J (2015) Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168:92–103
Sun L, Wang LY, Ding WP (2021) Feature Selection Using Fuzzy Neighborhood Entropy-Based Uncertainty Measures for Fuzzy Neighborhood Multigranulation Rough Sets. IEEE Trans Fuzzy Syst 29(1):19–33
Al-An A (2009) A dependency-based search strategy for feature selection. Expert Syst Appl 36 (10):12392–12398
Wang CZ, Huang Y, Ding WP, Cao ZH (2021) Attribute reduction with fuzzy rough self-information measures. Inf Sci 549:68–86
Jiang ZH, Liu KY, Yang XB, Yu HL, Fujita H, Qian YH (2020) Accelerator for supervised neighborhood based attribute reduction. Int J Approx Reason 119:122–150
Hu CX, Zhang L, Wang BJ, Zhang Z, Li FZ (2019) Incremental updating knowledge in neighborhood multigranulation rough sets under dynamic granular structures. Knowl Based Syst 163:811–829
Wang GY (2003) Rough Reduction in Algebra View and Information View. International Journal of Intelligent Systems 18:679–688
Wang CZ, Huang Y, Shao MW, Fan XD (2019) Fuzzy rough set-based attribute reduction using distance measures. Knowl Based Syst 164:205–212
Jiang ZH, Liu KY, Song JJ, Yang XB, Li JH, Qian YH (2021) Accelerator for crosswise computing reduct. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106740
Fan J, Jiang YL, Liu Y (2017) Quick attribute reduction with generalized indiscernibility models. Inf Sci 397:15–36
Cai MJ, Lang GM, Fujita H, Li ZY, Yang T (2019) Incremental approaches to updating reducts under dynamic covering granularity. Knowl Based Syst 172:130–140
Saqlain SM, Sher M, Shah FA, Khan I (2019) Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines. Knowl Inf Syst 58 (1):139–167
Yilmaz E (2013) An Expert System Based on Fisher Score and LS-SVM for Cardiac Arrhythmia Diagnosis. Comput Math Methods Med. https://doi.org/10.1155/2013/849674
Sun L, Zhang XY, Qian YH, Xu JC (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41
Shannon CE (1997) The mathematical theory of communication. MD Comput: Computers in Medical Practice 14(4):306–317
Sun L, Wang LY, Ding WP, Qian YH, Xu JC (2020) Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.105373
Chen DG, Zhang L, Zhao SY, Hu QH, Zhu PF (2012) A novel algorithm for finding reducts with fuzzy rough sets. IEEE Trans Fuzzy Syst 20(2):385–389
Qian YH, Wang Q, Cheng HH, Liang JY, Dang CY (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258(1):61–78
Jensen R, Shen Q (2009) New approaches to fuzzy-rough feature selection. IEEE Trans Fuzzy Syst 17(4):824–838
Tan AH, Wu WZ, Qian YH, Liang JY, Chen JK, Li JJ (2019) Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans Fuzzy Syst 27(3):527–539
Chen YM, Zhang ZJ, Zheng JZ, Ma Y, Xue Y (2017) Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 67:59–68
Xu FF, Miao DQ, Wei L (2009) Fuzzy-rough attribute reduction via mutual information with an application to cancer classification. Comput Math Appl 57(6):1010–1017
Fan XD, Zhao WD, Wang CZ, Huang Y (2018) Attribute reduction based on max-decision neighborhood rough set model. Knowl Based Syst 151:16–23
Zhang W, Chen JJ (2018) Relief feature selection and parameter optimization for support vector machine based on mixed kernel function. Int J Perform Eng 14(2):280–289
Lu HJ, Chen JY, Yan K, Jin Q, Xue Y, Gao ZG (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62
Li JT, Dong WP, Meng DY (2018) Grouped gene selection of cancer via adaptive sparse group lasso based on conditional mutual information. IEEE/ACM Trans Comput Biol Bioinform 15(6):2028–2038
Apolloni J, Leguizamon G, Alba E (2016) Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Computing 38:922–932
Friedman M (1940) A comparison of alternative tests of significance for the problem of mrankings. Ann Math Stat 11:86–92
Dunn OJ (1961) Multiple comparisons among means. Publications of the American Statistical Association 56:52–64
Fujita H, Gaeta A, Loia V, Orciuoli F (2020) Hypotheses Analysis and Assessment in Counterterrorism Activities: A Method Based on OWA and Fuzzy Probabilistic Rough Sets. IEEE Trans Fuzzy Syst 28:831–845
Xu JC, Qu KL, Meng XR, Sun YH, Hou QC (2022) Feature selection based on multiview entropy measures in multiperspective rough set. Int J Intell Syst. https://doi.org/10.1002/int.22878
Fujita H, Gaeta A, Loia V, Orciuoli F (2019) Resilience Analysis of Critical Infrastructures: A Cognitive Approach Based on Granular Computing. IEEE Trans Cybern 49:1835–1848
Acknowledgments
The paper is supported in part by the National Natural Science Foundation of China under Grant (61976082, 62002103).
Author information
Authors and Affiliations
Contributions
Conceptualization: Jiucheng Xu; Methodology: Kanglin Qu; Writing - original draft preparation: Kanglin Qu, Yuanhao Sun, Jie Yang; Writing - review and editing: Yuanhao Sun, Jie Yang; Funding acquisition: Jiucheng Xu.
Corresponding authors
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, J., Qu, K., Sun, Y. et al. Feature selection using self-information uncertainty measures in neighborhood information systems. Appl Intell 53, 4524–4540 (2023). https://doi.org/10.1007/s10489-022-03760-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03760-5