Skip to main content
Log in

Multi-label feature selection for missing labels by granular-ball based mutual information

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-label feature selection serves an effective dimensionality reduction technique in the high-dimensional multi-label data. However, most feature selection methods regard the label as complete. In fact, in real-world applications, labels in a multi-label dataset may be missing due to various difficulties in collecting sufficient labels, which enables some valuable information to be overlooked and leads to an inaccurate prediction in the classification. To address these issues, a feature selection algorithm based on the granular-ball based mutual information is proposed for the multi-label data with missing labels in this paper. At first, to improve the classification ability, a label recovery model is proposed to calculate some labels, which utilizes the correlation between labels, the properties of label specific features and global common features. Secondly, to avoid computing the neighborhood radius, a granular-ball based mutual information metric for evaluating candidate features is proposed, which well fits the data distribution. Finally, the corresponding feature selection algorithm is developed for selecting a subset from the multi-label data with missing labels. Experiments on the different datasets demonstrate that compared with the state-of-the-art algorithms the proposed algorithm considerably improves the classification accuracy. The code is publicly available online at https://github.com/skylark-leo/MLMLFS.git

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

Data will be made available on request.

References

  1. Zhang J, Liu K, Yang X et al (2023) Multi-label learning with relief-based label-specific feature selection. Appl Intell 53(15):18517–18530

    Google Scholar 

  2. Wang K, Yang M, Yang W et al (2022) Dual-scale correlation analysis for robust multi-label classification. Appl Intell 52(14):16382–16397

    Google Scholar 

  3. Zhang P, Liu G, Gao W et al (2021) Multi-label feature selection considering label supplementation. Pattern Recognit 120:108137

    Google Scholar 

  4. Wang Z, Chen H, Mi Y et al (2024) Joint subspace reconstruction and label correlation for multi-label feature selection. Appl Intell 54(1):1117–1143

    Google Scholar 

  5. Han Q, Hu L, Gao W (2024) Feature relevance and redundancy coefficients for multi-view multi-label feature selection. Inf Sci 652:119747

    Google Scholar 

  6. Ma J, Xu F, Rong X (2024) Discriminative multi-label feature selection with adaptive graph diffusion. Pattern Recognit 148:110154

    Google Scholar 

  7. Lim H, Kim D (2020) MFC: initialization method for multi-label feature selection based on conditional mutual information. Neurocomputing 382:40–51

    Google Scholar 

  8. Sun L, Yin T, Ding W et al (2022) Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy. IEEE Trans Fuzzy Syst 30(5):1197–1211

    Google Scholar 

  9. Liu Y, Chen H, Li T et al (2023) A robust graph based multi-label feature selection considering feature-label dependency. Appl Intell 53(1):837–863

    Google Scholar 

  10. Lu H, Chen H, Li T et al (2022) Multi-label feature selection based on manifold regularization and imbalance ratio. Appl Intell 52(10):11652–11671

    Google Scholar 

  11. Zhang Y, Zhou Z (2008) Multi-label dimensionality reduction via dependence maximization. In: Fox D, Gomes CP (eds) Proceedings of the twenty-third AAAI conference on artificial intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008. AAAI Press, pp 1503–1505

  12. Zhang Y, Ma Y, Yang X (2022) Multi-label feature selection based on logistic regression and manifold learning. Appl Intell 52(8):9256–9273

    Google Scholar 

  13. Kumar S, Ahmadi N, Rastogi R (2023) Multi-label learning with missing labels using sparse global structure for label-specific features. Appl Intell 53(15):18155–18170

    Google Scholar 

  14. Cheng Z, Zeng Z (2020) Joint label-specific features and label correlation for multi-label learning with missing label. Appl Intell 50(11):4029–4049

    Google Scholar 

  15. He Z, Yang M, Gao Y et al (2019) Joint multi-label classification and label correlations with missing labels and feature selection. Knowl Based Syst 163:145–158

    Google Scholar 

  16. Guo B, Hou C, Shan J, et al (2018) Low rank multi-label classification with missing labels. In: 24th International conference on pattern recognition, ICPR 2018, Beijing, China, August 20-24, 2018. IEEE Computer Society, pp 417–422

  17. Huang J, Qin F, Zheng X, et al (2018) Learning label-specific features for multi-label classification with missing labels. In: Fourth IEEE international conference on multimedia big data, BigMM 2018, Xi’an, China, September 13-16, 2018. IEEE, pp 1–5

  18. Hu Q, Zhang L, Zhang D et al (2011) Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst Appl 38(9):10737–10750

    Google Scholar 

  19. Xia S, Liu Y, Ding X et al (2019) Granular-ball computing classifiers for efficient, scalable and robust learning. Inf Sci 483:136–152

    MathSciNet  Google Scholar 

  20. Zhu P, Xu Q, Hu Q et al (2018) Multi-label feature selection with missing labels. Pattern Recognit 74:488–502

    Google Scholar 

  21. Ma J, Chow TWS (2018) Robust non-negative sparse graph for semi-supervised multi-label learning with missing labels. Inf Sci 422:336–351

    MathSciNet  Google Scholar 

  22. Jiang L, Yu G, Guo M et al (2020) Feature selection with missing labels based on label compression and local feature correlation. Neurocomputing 395:95–106

    Google Scholar 

  23. Zhang J, Wu H, Jiang M et al (2023) Group-preserving label-specific feature selection for multi-label learning. Expert Syst Appl 213:118861

    Google Scholar 

  24. Fan Y, Liu J, Weng W et al (2021) Multi-label feature selection with local discriminant model and label correlations. Neurocomputing 442:98–115

    Google Scholar 

  25. Yu K, Cai M, Wu X et al (2023) Multilabel feature selection: A local causal structure learning approach. IEEE Trans Neural Networks Learn Syst 34(6):3044–3057

    Google Scholar 

  26. Fan Y, Liu J, Weng W et al (2021) Multi-label feature selection with constraint regression and adaptive spectral graph. Knowl Based Syst 212:106621

    Google Scholar 

  27. Zhang P, Gao W, Hu J et al (2021) Multi-label feature selection based on the division of label topics. Inf Sci 553:129–153

    MathSciNet  Google Scholar 

  28. Yao E, Li D, Zhai Y et al (2022) Multilabel feature selection based on relative discernibility pair matrix. IEEE Trans Fuzzy Syst 30(7):2388–2401

    Google Scholar 

  29. Hu M, Tsang ECC, Guo Y et al (2021) A novel approach to attribute reduction based on weighted neighborhood rough sets. Knowl Based Syst 220:106908

    Google Scholar 

  30. Li Y, Cai M, Zhou J et al (2022) Accelerated multi-granularity reduction based on neighborhood rough sets. Appl Intell 52(15):17636–17651

    Google Scholar 

  31. Lee J, Kim D (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett 34(3):349–357

    Google Scholar 

  32. Liu J, Lin Y, Li Y et al (2018) Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recognit 84:273–287

    Google Scholar 

  33. Lin Y, Hu Q, Liu J et al (2016) Multi-label feature selection based on neighborhood mutual information. Appl Soft Comput 38:244–256

    Google Scholar 

  34. Zhou H, Wang X, Zhu R (2022) Feature selection based on mutual information with correlation coefficient. Appl Intell 52(5):5457–5474

    Google Scholar 

  35. Hu Q, Zhang L, Zhang D et al (2011) Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst Appl 38(9):10737–10750

    Google Scholar 

  36. Liu J, Lin Y, Ding W et al (2023) Multi-label feature selection based on label distribution and neighborhood rough set. Neurocomputing 524:142–157

    Google Scholar 

  37. Sun L, Chen Y, Ding W et al (2023) AMFSA: Adaptive fuzzy neighborhood-based multilabel feature selection with ant colony optimization. Appl Soft Comput 138:110211

    Google Scholar 

  38. Qian W, Dong P, Dai S et al (2022) Incomplete label distribution feature selection based on neighborhood-tolerance discrimination index. Appl Soft Comput 130:109693

    Google Scholar 

  39. Zhang P, Liu G, Song J (2023) MFSJMI: multi-label feature selection considering join mutual information and interaction weight. Pattern Recognit 138:109378

    Google Scholar 

  40. Lin Y, Hu Q, Liu J et al (2015) Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168:92–103

    Google Scholar 

  41. Wang C, Lin Y, Liu J (2019) Feature selection for multi-label learning with missing labels. Appl Intell 49(8):3027–3042

    Google Scholar 

  42. Xia S, Zheng S, Wang G et al (2023) Granular-ball sampling for noisy label classification or imbalanced classification. IEEE Trans Neural Networks Learn Syst 34(4):2144–2155

    Google Scholar 

  43. Chen Y, Wang P, Yang X et al (2021) Granular-ball guided selector for attribute reduction. Knowl Based Syst 229:107326

    Google Scholar 

  44. Zhang Q, Wu C, Xia S et al (2023) Incremental learning based on granular-ball rough sets for classification in dynamic mixed-type decision system. IEEE Trans Knowl Data Eng 35(9):9319–9332

    Google Scholar 

  45. Zhang M, Zhou Z (2007) ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048

    Google Scholar 

  46. Huang J, Li G, Huang Q et al (2016) Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans Knowl Data Eng 28(12):3309–3323

    Google Scholar 

  47. Xia S, Wang G, Gao X, et al (2022) Gbsvm: Granular-ball support vector machine. ArXiv:2210.03120

  48. Xie J, Kong W, Xia S et al (2023) An efficient spectral clustering algorithm based on granular-ball. IEEE Trans Knowl Data Eng 35:9743–9753

  49. Qian W, Ruan W, Li Y et al (2023) Granular-ball-based label enhancement for dimensionality reduction in multi-label data. Appl Intell 53:24008–24033

    Google Scholar 

  50. Xia S, Dai X, Wang G, et al (2022) An efficient and adaptive granular-ball generation method in classification problem. IEEE Trans Neural Networks Learn Syst 1-13

  51. Xia S, Zhang Z, Li W et al (2020) Gbnrs: A novel rough set algorithm for fast adaptive attribute reduction in classification. IEEE Trans Knowl Data Eng 34:1231–1242

    Google Scholar 

  52. Peng X, Wang P, Xia S et al (2022) Vpgb: A granular-ball based model for attribute reduction and classification with label noise. Inf Sci 611:504–521

    Google Scholar 

  53. Ji X, Peng J, Zhao P et al (2023) Extended rough sets model based on fuzzy granular-ball and its attribute reduction. Inf Sci 640:119071

    Google Scholar 

  54. Qian W, Li Y, Ye Q et al (2023) Disambiguation-based partial label feature selection via feature dependency and label consistency. Inf Fusion 94:152–168

    Google Scholar 

  55. Huang J, Qin F, Zheng X et al (2019) Improving multi-label classification with missing labels by learning label-specific features. Inf Sci 492:124–146

    MathSciNet  Google Scholar 

  56. Wang Y, Zheng W, Cheng Y et al (2020) Joint label completion and label-specific features for multi-label learning algorithm. Soft Computing 24:6553–6569

    Google Scholar 

  57. Multi-Label Classification Dataset Repository, http://www.uco.es/kdis/mllresources/

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (No.62266018 and No.62366019), Natural Science Foundation of Jiangxi Province (No.20232BAB202052).

Author information

Authors and Affiliations

Authors

Contributions

Wenhao Shu: Conceptualization, Methodology, Visualization, Writing - original draft. Yichen Hu: Data curation, Software, Validation, Formal analysis, Writing - original draft. Wenbin Qian: Investigation, Supervision, Writing-review & editing.

Corresponding author

Correspondence to Wenbin Qian.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical standard

Not applicable to this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shu, W., Hu, Y. & Qian, W. Multi-label feature selection for missing labels by granular-ball based mutual information. Appl Intell 54, 12589–12612 (2024). https://doi.org/10.1007/s10489-024-05809-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05809-z

Keywords