Abstract
Multi-label feature selection serves an effective dimensionality reduction technique in the high-dimensional multi-label data. However, most feature selection methods regard the label as complete. In fact, in real-world applications, labels in a multi-label dataset may be missing due to various difficulties in collecting sufficient labels, which enables some valuable information to be overlooked and leads to an inaccurate prediction in the classification. To address these issues, a feature selection algorithm based on the granular-ball based mutual information is proposed for the multi-label data with missing labels in this paper. At first, to improve the classification ability, a label recovery model is proposed to calculate some labels, which utilizes the correlation between labels, the properties of label specific features and global common features. Secondly, to avoid computing the neighborhood radius, a granular-ball based mutual information metric for evaluating candidate features is proposed, which well fits the data distribution. Finally, the corresponding feature selection algorithm is developed for selecting a subset from the multi-label data with missing labels. Experiments on the different datasets demonstrate that compared with the state-of-the-art algorithms the proposed algorithm considerably improves the classification accuracy. The code is publicly available online at https://github.com/skylark-leo/MLMLFS.git
Graphical abstract













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Data will be made available on request.
References
Zhang J, Liu K, Yang X et al (2023) Multi-label learning with relief-based label-specific feature selection. Appl Intell 53(15):18517–18530
Wang K, Yang M, Yang W et al (2022) Dual-scale correlation analysis for robust multi-label classification. Appl Intell 52(14):16382–16397
Zhang P, Liu G, Gao W et al (2021) Multi-label feature selection considering label supplementation. Pattern Recognit 120:108137
Wang Z, Chen H, Mi Y et al (2024) Joint subspace reconstruction and label correlation for multi-label feature selection. Appl Intell 54(1):1117–1143
Han Q, Hu L, Gao W (2024) Feature relevance and redundancy coefficients for multi-view multi-label feature selection. Inf Sci 652:119747
Ma J, Xu F, Rong X (2024) Discriminative multi-label feature selection with adaptive graph diffusion. Pattern Recognit 148:110154
Lim H, Kim D (2020) MFC: initialization method for multi-label feature selection based on conditional mutual information. Neurocomputing 382:40–51
Sun L, Yin T, Ding W et al (2022) Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy. IEEE Trans Fuzzy Syst 30(5):1197–1211
Liu Y, Chen H, Li T et al (2023) A robust graph based multi-label feature selection considering feature-label dependency. Appl Intell 53(1):837–863
Lu H, Chen H, Li T et al (2022) Multi-label feature selection based on manifold regularization and imbalance ratio. Appl Intell 52(10):11652–11671
Zhang Y, Zhou Z (2008) Multi-label dimensionality reduction via dependence maximization. In: Fox D, Gomes CP (eds) Proceedings of the twenty-third AAAI conference on artificial intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008. AAAI Press, pp 1503–1505
Zhang Y, Ma Y, Yang X (2022) Multi-label feature selection based on logistic regression and manifold learning. Appl Intell 52(8):9256–9273
Kumar S, Ahmadi N, Rastogi R (2023) Multi-label learning with missing labels using sparse global structure for label-specific features. Appl Intell 53(15):18155–18170
Cheng Z, Zeng Z (2020) Joint label-specific features and label correlation for multi-label learning with missing label. Appl Intell 50(11):4029–4049
He Z, Yang M, Gao Y et al (2019) Joint multi-label classification and label correlations with missing labels and feature selection. Knowl Based Syst 163:145–158
Guo B, Hou C, Shan J, et al (2018) Low rank multi-label classification with missing labels. In: 24th International conference on pattern recognition, ICPR 2018, Beijing, China, August 20-24, 2018. IEEE Computer Society, pp 417–422
Huang J, Qin F, Zheng X, et al (2018) Learning label-specific features for multi-label classification with missing labels. In: Fourth IEEE international conference on multimedia big data, BigMM 2018, Xi’an, China, September 13-16, 2018. IEEE, pp 1–5
Hu Q, Zhang L, Zhang D et al (2011) Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst Appl 38(9):10737–10750
Xia S, Liu Y, Ding X et al (2019) Granular-ball computing classifiers for efficient, scalable and robust learning. Inf Sci 483:136–152
Zhu P, Xu Q, Hu Q et al (2018) Multi-label feature selection with missing labels. Pattern Recognit 74:488–502
Ma J, Chow TWS (2018) Robust non-negative sparse graph for semi-supervised multi-label learning with missing labels. Inf Sci 422:336–351
Jiang L, Yu G, Guo M et al (2020) Feature selection with missing labels based on label compression and local feature correlation. Neurocomputing 395:95–106
Zhang J, Wu H, Jiang M et al (2023) Group-preserving label-specific feature selection for multi-label learning. Expert Syst Appl 213:118861
Fan Y, Liu J, Weng W et al (2021) Multi-label feature selection with local discriminant model and label correlations. Neurocomputing 442:98–115
Yu K, Cai M, Wu X et al (2023) Multilabel feature selection: A local causal structure learning approach. IEEE Trans Neural Networks Learn Syst 34(6):3044–3057
Fan Y, Liu J, Weng W et al (2021) Multi-label feature selection with constraint regression and adaptive spectral graph. Knowl Based Syst 212:106621
Zhang P, Gao W, Hu J et al (2021) Multi-label feature selection based on the division of label topics. Inf Sci 553:129–153
Yao E, Li D, Zhai Y et al (2022) Multilabel feature selection based on relative discernibility pair matrix. IEEE Trans Fuzzy Syst 30(7):2388–2401
Hu M, Tsang ECC, Guo Y et al (2021) A novel approach to attribute reduction based on weighted neighborhood rough sets. Knowl Based Syst 220:106908
Li Y, Cai M, Zhou J et al (2022) Accelerated multi-granularity reduction based on neighborhood rough sets. Appl Intell 52(15):17636–17651
Lee J, Kim D (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett 34(3):349–357
Liu J, Lin Y, Li Y et al (2018) Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recognit 84:273–287
Lin Y, Hu Q, Liu J et al (2016) Multi-label feature selection based on neighborhood mutual information. Appl Soft Comput 38:244–256
Zhou H, Wang X, Zhu R (2022) Feature selection based on mutual information with correlation coefficient. Appl Intell 52(5):5457–5474
Hu Q, Zhang L, Zhang D et al (2011) Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst Appl 38(9):10737–10750
Liu J, Lin Y, Ding W et al (2023) Multi-label feature selection based on label distribution and neighborhood rough set. Neurocomputing 524:142–157
Sun L, Chen Y, Ding W et al (2023) AMFSA: Adaptive fuzzy neighborhood-based multilabel feature selection with ant colony optimization. Appl Soft Comput 138:110211
Qian W, Dong P, Dai S et al (2022) Incomplete label distribution feature selection based on neighborhood-tolerance discrimination index. Appl Soft Comput 130:109693
Zhang P, Liu G, Song J (2023) MFSJMI: multi-label feature selection considering join mutual information and interaction weight. Pattern Recognit 138:109378
Lin Y, Hu Q, Liu J et al (2015) Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168:92–103
Wang C, Lin Y, Liu J (2019) Feature selection for multi-label learning with missing labels. Appl Intell 49(8):3027–3042
Xia S, Zheng S, Wang G et al (2023) Granular-ball sampling for noisy label classification or imbalanced classification. IEEE Trans Neural Networks Learn Syst 34(4):2144–2155
Chen Y, Wang P, Yang X et al (2021) Granular-ball guided selector for attribute reduction. Knowl Based Syst 229:107326
Zhang Q, Wu C, Xia S et al (2023) Incremental learning based on granular-ball rough sets for classification in dynamic mixed-type decision system. IEEE Trans Knowl Data Eng 35(9):9319–9332
Zhang M, Zhou Z (2007) ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
Huang J, Li G, Huang Q et al (2016) Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans Knowl Data Eng 28(12):3309–3323
Xia S, Wang G, Gao X, et al (2022) Gbsvm: Granular-ball support vector machine. ArXiv:2210.03120
Xie J, Kong W, Xia S et al (2023) An efficient spectral clustering algorithm based on granular-ball. IEEE Trans Knowl Data Eng 35:9743–9753
Qian W, Ruan W, Li Y et al (2023) Granular-ball-based label enhancement for dimensionality reduction in multi-label data. Appl Intell 53:24008–24033
Xia S, Dai X, Wang G, et al (2022) An efficient and adaptive granular-ball generation method in classification problem. IEEE Trans Neural Networks Learn Syst 1-13
Xia S, Zhang Z, Li W et al (2020) Gbnrs: A novel rough set algorithm for fast adaptive attribute reduction in classification. IEEE Trans Knowl Data Eng 34:1231–1242
Peng X, Wang P, Xia S et al (2022) Vpgb: A granular-ball based model for attribute reduction and classification with label noise. Inf Sci 611:504–521
Ji X, Peng J, Zhao P et al (2023) Extended rough sets model based on fuzzy granular-ball and its attribute reduction. Inf Sci 640:119071
Qian W, Li Y, Ye Q et al (2023) Disambiguation-based partial label feature selection via feature dependency and label consistency. Inf Fusion 94:152–168
Huang J, Qin F, Zheng X et al (2019) Improving multi-label classification with missing labels by learning label-specific features. Inf Sci 492:124–146
Wang Y, Zheng W, Cheng Y et al (2020) Joint label completion and label-specific features for multi-label learning algorithm. Soft Computing 24:6553–6569
Multi-Label Classification Dataset Repository, http://www.uco.es/kdis/mllresources/
Acknowledgements
This work is supported by National Natural Science Foundation of China (No.62266018 and No.62366019), Natural Science Foundation of Jiangxi Province (No.20232BAB202052).
Author information
Authors and Affiliations
Contributions
Wenhao Shu: Conceptualization, Methodology, Visualization, Writing - original draft. Yichen Hu: Data curation, Software, Validation, Formal analysis, Writing - original draft. Wenbin Qian: Investigation, Supervision, Writing-review & editing.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical standard
Not applicable to this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shu, W., Hu, Y. & Qian, W. Multi-label feature selection for missing labels by granular-ball based mutual information. Appl Intell 54, 12589–12612 (2024). https://doi.org/10.1007/s10489-024-05809-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05809-z