Skip to main content
Log in

Multi-label feature selection via redundancy of the selected feature set

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Due to the growing number of high-dimensional multi-label data which emerge in modern applications, multi-label feature selection becomes an important issue. Traditional multi-label feature selection algorithms focus on evaluating the relevance of the candidate and label set, which neglects the impact of selected feature set. There are few studies on the redundancy analysis of the selected features, resulting in the most discriminative features being ignored. To solve this problem, we propose a novel multi-label feature selection algorithm based on the fuzzy rough set. First, we propose the definition of redundancy weight for each selected feature via fuzzy interaction information to evaluate the correlation between features in selected feature set, and design the instance equivalence matrix based on the redundancy weight. Second, the fuzzy conditional mutual information is defined to evaluate the relevance between candidate features and label set given selected feature set. Finally, we combine the redundancy analysis with the feature relevance for designing the multi-label feature selection algorithm. To verify the performance of the proposed algorithm, the proposed algorithm is compared to nine representative feature selection algorithms on synthetic and real-world datasets. The experimental test and statistical test show that our proposed algorithm outperforms the other compared algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ah A, Mbd A, Np B (2020) Mfs-mcdm: Multi-label feature selection using multi-criteria decision making - sciencedirect. Knowl-Based Syst 206

  2. Ata B, Jl A, Wzwb C, Jia ZD, Lin SE, Chao CF (2021) Fuzzy rough discrimination and label weighting for multi-label feature selection. Neurocomputing

  3. Chen D, Yang Y (2013) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334

    Article  Google Scholar 

  4. Chen W, Yan J, Zhang B, Chen Z, Yang Q (2007) Document transformation for multi-label feature selection in text categorization. In: Seventh IEEE international conference on data mining (ICDM 2007), IEEE, pp 451–456

  5. Dai J, Chen J, Liu Y, Hu H (2020) Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation. Knowl-Based Syst 207:106342

    Article  Google Scholar 

  6. Dai J, Han H, Hu Q, Liu M (2016) Discrete particle swarm optimization approach for cost sensitive attribute reduction. Knowl-Based Syst 102:116–126

    Article  Google Scholar 

  7. Dai J, Hu Q, Zhang J, Hu H, Zheng N (2016) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybern 47(9):2460–2471

    Article  Google Scholar 

  8. Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221

    Article  Google Scholar 

  9. Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  Google Scholar 

  10. Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. International Journal of General System 17(2-3):191– 209

    Article  MATH  Google Scholar 

  11. Fürnkranz J, Hüllermeier E, Mencía EL, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153

    Article  MATH  Google Scholar 

  12. Gl A, Sv B, Ac A (2019) Distributed multi-label feature selection using individual mutual information measures. Knowl-Based Syst 188

  13. Hashemi A, Dowlatshahi MB, Nezamabadi-Pour H (2020) A bipartite matching-based feature selection for multi-label learning. International journal of machine learning and cybernetics, pp 1–17

  14. Hu Q, Yu D, Xie Z, Liu J (2006) Fuzzy probabilistic approximation spaces and their information measures. IEEE Trans Fuzzy Syst 14(2):191–201

    Article  Google Scholar 

  15. Jian L, Li J, Shu K, Liu H (2016) Multi-label informed feature selection. IJCAI 16:1627–33

    Google Scholar 

  16. Lee J, Kim DW (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recogn Lett 34(3):349–357

    Article  Google Scholar 

  17. Lee J, Kim DW (2017) Scls: Multi-label feature selection based on scalable criterion for large label set. Pattern Recogn 66

  18. Let X (2005) Pattern classification

  19. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: A data perspective. ACM Computing Surveys (CSUR) 50(6):1–45

    Article  Google Scholar 

  20. Lin Y, Hu Q, Liu J, Li J, Wu X (2017) Streaming feature selection for multilabel learning based on fuzzy mutual information. IEEE Trans Fuzzy Syst 25(6):1491–1507

    Article  Google Scholar 

  21. Liu J, Li Y, Weng W, Zhang J, Chen B, Wu S (2020) Feature selection for multi-label learning with streaming label. Neurocomputing 387:268–278

    Article  Google Scholar 

  22. Liu K, Yang X, Yu H, Mi J, Wang P, Chen X (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowledge-based Systems 165:282–296

    Article  Google Scholar 

  23. Lou Q, Deng Z, Choi KS, Shen H, Wang S (2021) Robust multi-label relief feature selection based on fuzzy margin co-optimization. IEEE Transactions on Emerging Topics in Computational Intelligence PP(99):1–12

    Google Scholar 

  24. Luaces O, Díez J, Barranquero J, del Coz JJ, Bahamonde A (2012) Binary relevance efficacy for multilabel classification. Progress in Artificial Intelligence 1(4):303–313

    Article  Google Scholar 

  25. Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(5):341–356

    Article  MATH  Google Scholar 

  26. Qian W, Xiong C, Wang Y (2020) A ranking-based feature selection for multi-label classification with fuzzy relative discernibility. Appl Soft Comput 102(10):106995

    Google Scholar 

  27. Qian Y, Wang Q, Cheng H, Liang J, Dang C (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258:61–78

    Article  MathSciNet  MATH  Google Scholar 

  28. Shannon CE (1949) Communication theory of secrecy systems. Bell Syst Tech J 28(4):656–715

    Article  MathSciNet  MATH  Google Scholar 

  29. Tan A, Wu WZ, Qian Y, Liang J, Chen J, Li J (2018) Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans Fuzzy Syst 27(3):527–539

    Article  Google Scholar 

  30. Tomás JT, Spolaôr N, Cherman EA, Monard MC (2014) A framework to generate synthetic multi-label datasets. Electronic Notes in Theoretical Computer Science 302:155–176

    Article  Google Scholar 

  31. Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: A java library for multi-label learning. J Mach Learn Res 12(7):2411–2414

    MathSciNet  MATH  Google Scholar 

  32. Wang C, Huang Y, Shao M, Fan X (2019) Fuzzy rough set-based attribute reduction using distance measures. Knowl-Based Syst 164:205–212

    Article  Google Scholar 

  33. Wang C, Shao M, He Q, Qian Y, Qi Y (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl-Based Syst 111:173–179

    Article  Google Scholar 

  34. Wang J, Wei JM, Yang Z, Wang SQ (2017) Feature selection by maximizing independent classification information. IEEE Trans Knowl Data Eng 29(4):828–841

    Article  Google Scholar 

  35. Wei G, Zhao J, Feng Y, He A, Yu J (2020) A novel hybrid feature selection method based on dynamic feature importance. Appl Soft Comput 93(6):106337

    Article  Google Scholar 

  36. Wei-hua X, Xiao-yan Z, Wen-xiu Z (2009) Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems. Appl Soft Comput 9(4):1244–1251

    Article  Google Scholar 

  37. Xiong C, Qian W, Wang Y, Huang J (2021) Feature selection based on label distribution and fuzzy mutual information. Information Sciences 574(6)

  38. Yang Y, Chen D, Wang H, Wang X (2017) Incremental perspective for feature selection based on fuzzy rough sets. IEEE Trans Fuzzy Syst 26(3):1257–1273

    Article  Google Scholar 

  39. Yeung DS, Chen D, Tsang EC, Lee JW, Xizhao W (2005) On the generalization of fuzzy rough sets. IEEE Trans Fuzzy Syst 13(3):343–361

    Article  Google Scholar 

  40. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MathSciNet  MATH  Google Scholar 

  41. Zadeh LA (1996) Fuzzy sets. In: Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers by lotfi a zadeh, World Scientific, pp 394–432

  42. Zhang ML, Zhou ZH (2007) Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048

    Article  MATH  Google Scholar 

  43. Zhang Y, Zhou ZH (2010) Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data (TKDD) 4(3):1–21

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by The National Nature Science Foundation of China (Grant Nos. , 61772226 and 61862056), The Natural Science Foundation of Jilin Province (Grant number No. 20200201159JC), Key Laboratory for Symbol Computation and Knowledge Engineering of the National Education Ministry of China, Jilin University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guixia Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, H., Zhang, P. & Liu, G. Multi-label feature selection via redundancy of the selected feature set. Appl Intell 53, 11073–11091 (2023). https://doi.org/10.1007/s10489-022-03365-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03365-y

Keywords

Navigation