Skip to main content
Log in

Feature selection using self-information uncertainty measures in neighborhood information systems

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The neighborhood rough set model (NRS) has been widely applied to study feature selection. Nevertheless, the dependency, as a significant feature evaluation function in NRS, only focuses on the classification information in the lower approximation and ignores the classification information in the upper approximation, which affects the evaluation effect of this function. Consequently, this paper first defines the fuzziness using the upper approximation and proposes two self-information uncertainty measures based on the dependency and fuzziness. Second, combining the above two self-information uncertainty measures, a more comprehensive approximate self-information is proposed for evaluating the uncertainty of the classification information of feature subsets. Furthermore, a heuristic feature selection algorithm is constructed based on the approximate self-information. Third, to reduce the time cost of the constructed algorithm in processing high-dimensional datasets, we propose a two-stage selection strategy, in which the first stage adopts the Fisher score dimensionality reduction method (FS) with low time cost and stable performance to retain important features in the high-dimensional dataset as a candidate feature subset. Then, the second stage employs our algorithm to further reduce the candidate feature subset. Finally, the results of various feature selection algorithms on eleven datasets are presented, and the comparison results confirm that our algorithm is efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Lin EL, Chen Q, Qi XM (2020) Deep reinforcement learning for imbalanced classification. Appl Intell 50(8):2488–2502

    Article  Google Scholar 

  2. Bai SX, Lin YJ, Lv Y, Chen JK, Wang CX (2021) Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification. Appl Intell 51(3):1602–1615

    Article  Google Scholar 

  3. Sharmin S, Shoyaib M, Ali AA (2019) Simultaneous feature selection and discretization based on mutual information. Pattern Recognit 91:162–174

    Article  Google Scholar 

  4. Bugata P, Drotar P (2020) On some aspects of minimum redundancy maximum relevance feature selection. Sci China Inf Sci. https://doi.org/10.1007/s11432-019-2633-y

  5. Gao WF, Hu L, Zhang P, He JL (2018) Feature selection considering the composition of feature relevancy. Pattern Recognit Letters 112:70–74

    Article  Google Scholar 

  6. Wei GF, Zhao J, Feng YL, He AX, Yu J (2020) A novel hybrid feature selection method based on dynamic feature importance. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106337

  7. Zhang P, Gao WF, Liu GX (2018) Feature selection considering weighted relevancy. Appl Intell 48(12):4615–4625

    Article  Google Scholar 

  8. Xu JC, Qu KL, Yang Y (2021) Feature Selection Combining Information Theory View and Algebraic View in the Neighborhood Decision System. Entropy. https://doi.org/10.3390/e23060704

  9. Liu KY, Yang XB, Fujita H, Liu D, Yang X, Qian YH (2019) An efficient selector for multi-granularity attribute reduction. Inf Sci 505:457–472

    Article  Google Scholar 

  10. Li JD, Cheng KW, Wang SH, Morstatter F, Trevino RP, Tang JL, Liu H (2018) Feature Selection: A Data Perspective. ACM Comput Surv. https://doi.org/10.1145/3136625

  11. Shahee SA, Ananthakumar U (2020) An effective distance based feature selection approach for imbalanced data. Appl Intell 50(3):717–745

    Article  Google Scholar 

  12. Pawlak Z, Skowron A (2007) Rough sets: Some extensions. Inf Sci 177(1):28–40

    Article  MATH  Google Scholar 

  13. Yang X, Li TR, Liu D, Fujita H (2020) A multilevel neighborhood sequential decision approach of three-way granular computing. Inf Sci 538:119–141

    Article  MATH  Google Scholar 

  14. Sheeja TK, Kuriakose A S (2018) A novel feature selection method using fuzzy rough sets. Comput Ind 97:111–116

    Article  Google Scholar 

  15. Ni P, Zhao SY, Wang XZ, Chen H, Li CP (2020) Incremental feature selection based on fuzzy rough sets. Inf Sci 539:185–204

    Article  MATH  Google Scholar 

  16. Cai YL, Zhang HG, He Q, Duan J (2020) A novel framework of fuzzy oblique decision tree construction for pattern classification. Appl Intell 50(9):2959–2975

    Article  Google Scholar 

  17. Miao DQ (2001) Discretization of continuous attributes in rough set theory. Acta Autom Sin 27:296–302

    Google Scholar 

  18. Yue XD, Chen YF, Miao DQ, Fujita H (2020) Fuzzy neighborhood covering for three-way classification. Inf Sci 507:795–808

    Article  MATH  Google Scholar 

  19. Hu Q H, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594

    Article  MATH  Google Scholar 

  20. Liu KY, Li TR, Yang XB, Yang X, Liu D, Zhang PF, Wang J (2021) Granular cabin: An efficient solution to neighborhood learning in big data. Inf Sci 583:189–201

    Article  Google Scholar 

  21. Zheng KF, Wang XJ, Wu B, Wu T (2020) Feature subset selection combining maximal information entropy and maximal information coefficient. Applied Intelligebce 50(2):487–501

    Article  Google Scholar 

  22. Chen YY, Chen YM (2021) Feature Subset Selection Based on Variable Precision Neighborhood Rough Sets. Int J Comput Intell Syst 14(1):572–581

    Article  Google Scholar 

  23. Zhang X, Mei CL, Chen DG, Liu JH (2016) Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy. Pattern Recognit 56:1–15

    Article  MATH  Google Scholar 

  24. Wang CZ, Huang Y, Shao MW, Hu QH, Chen DG (2020) Feature Selection Based on Neighborhood Self-Information. IEEE Trans Cybern 50(9):4031–4042

    Article  Google Scholar 

  25. Lin YJ, Hu QH, Liu JH, Duan J (2015) Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168:92–103

    Article  Google Scholar 

  26. Sun L, Wang LY, Ding WP (2021) Feature Selection Using Fuzzy Neighborhood Entropy-Based Uncertainty Measures for Fuzzy Neighborhood Multigranulation Rough Sets. IEEE Trans Fuzzy Syst 29(1):19–33

    Article  Google Scholar 

  27. Al-An A (2009) A dependency-based search strategy for feature selection. Expert Syst Appl 36 (10):12392–12398

    Article  Google Scholar 

  28. Wang CZ, Huang Y, Ding WP, Cao ZH (2021) Attribute reduction with fuzzy rough self-information measures. Inf Sci 549:68–86

    Article  MATH  Google Scholar 

  29. Jiang ZH, Liu KY, Yang XB, Yu HL, Fujita H, Qian YH (2020) Accelerator for supervised neighborhood based attribute reduction. Int J Approx Reason 119:122–150

    Article  MATH  Google Scholar 

  30. Hu CX, Zhang L, Wang BJ, Zhang Z, Li FZ (2019) Incremental updating knowledge in neighborhood multigranulation rough sets under dynamic granular structures. Knowl Based Syst 163:811–829

    Article  Google Scholar 

  31. Wang GY (2003) Rough Reduction in Algebra View and Information View. International Journal of Intelligent Systems 18:679–688

    Article  MATH  Google Scholar 

  32. Wang CZ, Huang Y, Shao MW, Fan XD (2019) Fuzzy rough set-based attribute reduction using distance measures. Knowl Based Syst 164:205–212

    Article  Google Scholar 

  33. Jiang ZH, Liu KY, Song JJ, Yang XB, Li JH, Qian YH (2021) Accelerator for crosswise computing reduct. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106740

  34. Fan J, Jiang YL, Liu Y (2017) Quick attribute reduction with generalized indiscernibility models. Inf Sci 397:15–36

    Google Scholar 

  35. Cai MJ, Lang GM, Fujita H, Li ZY, Yang T (2019) Incremental approaches to updating reducts under dynamic covering granularity. Knowl Based Syst 172:130–140

    Article  Google Scholar 

  36. Saqlain SM, Sher M, Shah FA, Khan I (2019) Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines. Knowl Inf Syst 58 (1):139–167

    Article  Google Scholar 

  37. Yilmaz E (2013) An Expert System Based on Fisher Score and LS-SVM for Cardiac Arrhythmia Diagnosis. Comput Math Methods Med. https://doi.org/10.1155/2013/849674

  38. Sun L, Zhang XY, Qian YH, Xu JC (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41

    Article  MATH  Google Scholar 

  39. Shannon CE (1997) The mathematical theory of communication. MD Comput: Computers in Medical Practice 14(4):306–317

    Google Scholar 

  40. Sun L, Wang LY, Ding WP, Qian YH, Xu JC (2020) Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.105373

  41. Chen DG, Zhang L, Zhao SY, Hu QH, Zhu PF (2012) A novel algorithm for finding reducts with fuzzy rough sets. IEEE Trans Fuzzy Syst 20(2):385–389

    Article  Google Scholar 

  42. Qian YH, Wang Q, Cheng HH, Liang JY, Dang CY (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258(1):61–78

    Article  MATH  Google Scholar 

  43. Jensen R, Shen Q (2009) New approaches to fuzzy-rough feature selection. IEEE Trans Fuzzy Syst 17(4):824–838

    Article  Google Scholar 

  44. Tan AH, Wu WZ, Qian YH, Liang JY, Chen JK, Li JJ (2019) Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans Fuzzy Syst 27(3):527–539

    Article  Google Scholar 

  45. Chen YM, Zhang ZJ, Zheng JZ, Ma Y, Xue Y (2017) Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 67:59–68

    Article  Google Scholar 

  46. Xu FF, Miao DQ, Wei L (2009) Fuzzy-rough attribute reduction via mutual information with an application to cancer classification. Comput Math Appl 57(6):1010–1017

    Article  MATH  Google Scholar 

  47. Fan XD, Zhao WD, Wang CZ, Huang Y (2018) Attribute reduction based on max-decision neighborhood rough set model. Knowl Based Syst 151:16–23

    Article  Google Scholar 

  48. Zhang W, Chen JJ (2018) Relief feature selection and parameter optimization for support vector machine based on mixed kernel function. Int J Perform Eng 14(2):280–289

    Google Scholar 

  49. Lu HJ, Chen JY, Yan K, Jin Q, Xue Y, Gao ZG (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62

    Article  Google Scholar 

  50. Li JT, Dong WP, Meng DY (2018) Grouped gene selection of cancer via adaptive sparse group lasso based on conditional mutual information. IEEE/ACM Trans Comput Biol Bioinform 15(6):2028–2038

    Article  Google Scholar 

  51. Apolloni J, Leguizamon G, Alba E (2016) Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Computing 38:922–932

    Article  Google Scholar 

  52. Friedman M (1940) A comparison of alternative tests of significance for the problem of mrankings. Ann Math Stat 11:86–92

    Article  MATH  Google Scholar 

  53. Dunn OJ (1961) Multiple comparisons among means. Publications of the American Statistical Association 56:52–64

    Article  MATH  Google Scholar 

  54. Fujita H, Gaeta A, Loia V, Orciuoli F (2020) Hypotheses Analysis and Assessment in Counterterrorism Activities: A Method Based on OWA and Fuzzy Probabilistic Rough Sets. IEEE Trans Fuzzy Syst 28:831–845

    Article  Google Scholar 

  55. Xu JC, Qu KL, Meng XR, Sun YH, Hou QC (2022) Feature selection based on multiview entropy measures in multiperspective rough set. Int J Intell Syst. https://doi.org/10.1002/int.22878

  56. Fujita H, Gaeta A, Loia V, Orciuoli F (2019) Resilience Analysis of Critical Infrastructures: A Cognitive Approach Based on Granular Computing. IEEE Trans Cybern 49:1835–1848

    Article  Google Scholar 

Download references

Acknowledgments

The paper is supported in part by the National Natural Science Foundation of China under Grant (61976082, 62002103).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Jiucheng Xu; Methodology: Kanglin Qu; Writing - original draft preparation: Kanglin Qu, Yuanhao Sun, Jie Yang; Writing - review and editing: Yuanhao Sun, Jie Yang; Funding acquisition: Jiucheng Xu.

Corresponding authors

Correspondence to Kanglin Qu or Yuanhao Sun.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, J., Qu, K., Sun, Y. et al. Feature selection using self-information uncertainty measures in neighborhood information systems. Appl Intell 53, 4524–4540 (2023). https://doi.org/10.1007/s10489-022-03760-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03760-5

Keywords

Navigation