Abstract
The dimension of data in the domain of multi-label learning is usually high, which makes the calculation cost very high. As an important data dimension reduction technology, feature selection has attracted the attention of many researchers. And the imbalance of data labels is also one of the factors that perplex multi-label learning. To tackle these problems, we propose a new multi-label feature selection algorithm named IMRFS, which combines manifold learning and label imbalance. Firstly, in order to keep the manifold structure between samples, the Laplacian graph is used to construct the manifold regularization. In addition, the local manifold structure of each label is considered to find the correlation between labels. And the imbalance distribution of labels is also considered, which is embedded into the manifold structure of labels. Furthermore, in order to ensure the robustness and sparsity of the IMRFS method, the L2,1-norm is applied to loss function and sparse regularization term simultaneously. Then, we adopt an iterative strategy to optimize the objective function of IMRFS. Finally, comparison results on multiple datasets show the effectiveness of IMRFS method.
Similar content being viewed by others
References
Ling J, Li J, Kai S, Liu H (2016) Multi-label informed feature selection. In: International joint conference on artificial intelligence, pp 1627–1633
Lee J, Kim D (2015) Fast multi-label feature selection based on information-theoretic feature ranking. Pattern Recognit J Pattern Recognit Soc 48(9):2671–2771
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017b) Feature selection: a data perspective. Acm Comput Surv 50(6):Article 94
Yun L, Tao L, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53(3):1–27
Tibshirani R, Friedman JH (2001) The elements of statistical learning. Journal of the Royal Statistical Society. Springer, New York
Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinform 2015:1–13
Lan G, Hou C, Nie F, Luo T, Yi D (2018) Robust feature selection via simultaneous sapped norm and sparse regularizer minimization. Neurocomputing 283:228–240
Gui J, Sun Z, Ji S, Tao D, Tan T (2016) Feature selection based on structured sparsity: a comprehensive study. IEEE Trans Neural Netw Learn Syst 28(7):1490–1507
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Zhang M, Zhou Z (2014) A review on multi-label learnin algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Gao W, Hu L, Zhang P (2018) Class-specific mutual information variation for feature selection. Pattern Recogn 79:328–339
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Huang J, Li G, Huang Q, Wu X (2017) Joint feature selection and classification for multilabel learning. IEEE Trans Cybern 48(3):876–889
Nie F, Huang H, Xiao C, Ding C (2010a) Efficient and robust feature selection via joint l2, 1-norms minimization. In: International conference on neural information processing systems, vol 2, pp 1813–1821
Liu J, Lin Y, Lin M, Wu S, Zhang J (2016) Feature selection based on quality of information. Neurocomputing 225:11–22
Li F, Miao D, Pedrycz W (2017a) Granular multi-label feature selection based on mutual information. Pattern Recogn 67:410–423
Zhang J, Li C, Cao D, Lin Y, Su S, Liang D, Li S (2018b) Multi-label learning with label-specific features by resolving label correlations. Knowl-Based Syst 159:148–157
Boutell MR, Lou J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757– 1771
Zhang ML, Zhou ZH (2007) MK-KNN: A lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Qi G, Hua X, Rui Y, Tang J (2007) Correlative multilabel video annotation. In: 15th ACM international conference on multimedia, pp 17–26
Brinker K, Mencia EL, Fuernkranz J, Huellermeier E (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–152
Xie S, Kong X, Gao J, Fan W (2013) Multilabel consensus classification. In: 13th IEEE international conference on data mining, pp 1241–C1246
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol 14, pp 681–687
Huang J, Li G, Huang Q, Wu X (2016) Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans Knowl Data Eng 28(12):3309–3323
Tang C, Bian M, Liu X, Li M, Yin H (2019) Unsupervised feature selection via latent representation learning and manifold regularization. Neural Netw 117:163–178
Zhu Y, Kwok JT, Zhou ZH (2017) Multi-label learning with global and local label correlation. IEEE Trans Knowl Data Eng 30(6):1081–1094
Cheng K, Gao S, Dong W, Yang X, Yu H (2020) Boosting label weighted extreme learning machine for classifying multi-label imbalanced data. Neurocomputing 403:360–370
Piri S, Delen D, Liu T (2018) A synthetic informative minority over-sampling(SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets. Decis Support Syst 106:15–29
Kang Q, Chen XS, Li SS, Zhou MC (2017) A noise-filtered under-sampling scheme for imbalanced classification. IEEE Trans Cybern 47(12):4263–4274
Zhang C, Tan KC, Li H, Hong GS (2018a) A cost-sensitive deep belief network for imbalanced classification. IEEE Trans Neural Netw Learn Syst 30(1):109–122
Yu H, Sun X, Yang S, Zou H (2019) Fuzzy support vector machine with relative density information for classifying imbalanced data. IEEE Trans Fuzzy Syst 27(12):2353–2367
Collell G, Prelec D, Patil KR (2018) A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data. Neurocomputing 275(31):330–340
Yu H, Mu C, Sun C, Yang W, Yang X, Xin Z (2015) Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data. Knowl-Based Syst 76:67–78
Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368
Sun Z, Song Q, Zhu X, Sun H (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48(5):1623–1637
Hu L, Li Y, Gao W, Zhang P, Hu J (2020b) Multi-label feature selection with shared common mode. Pattern Recogn 107344:104
Lin Y, Hu Q, Liu J, Duan J (2015) Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168:92–103
Zhang P, Liu G, Gao W (2019b) Distinguishing two types of labels for multi-label feature selection. Pattern Recogn 95:72–82
Kim DW, Lee J (2017) Scls: Multi-label feature selection based on scalable criterion for large label set. Pattern Recognit J Pattern Recognit Soc 66:342–352
Xiao C, Nie F, Huang H (2013) Exact top-k feature selection via L2,0-norm constraint. In: 23rd international joint conference on artificial intelligence, pp 1240–1246
Huang R, Jiang W, Sun G (2018) Manifold-based constraint laplacian score for multi-label feature selection. Pattern Recogn Lett 112:346–352
Cai Z, Zhu W (2018) Multi-label feature selection via feature manifold learning and sparsity regularization. Int J Mach Learn Cybern 9(8):1321–1334
Zhang J, Luo Z, Li C, Zhou C, Li S (2019a) Manifold regularized discriminative feature selection for multi-label learning. Pattern Recogn 95:136–150
Hu J, Li Y, Gao W, Zhang P (2020a) Robust multi-label feature selection with dual-graph regularization. Knowl-Based Syst 203:106–126
Huang H, Nie F, Wang H (2011) Unsupervised and semi-supervised learning via L1-norm graph. In: 2011 IEEE International conference on computer vision (ICCV), vol 2011, pp 2268–2273
Wang H, Yang Y, Liu B, Fujita H (2019) A study of graph-based system for multi-view clustering. Knowl-Based Syst 163:1009–1019
Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for Class-Imbalance learning. IEEE Trans Syst Man Cybern B 39(2):539–550
Sun Y, Wong K, Andrew KC, Kamel M (2009) Classification of imbalanced data: a review. Pattern Recognit Artif Intell 23(4):687–719
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBOost: A hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Humans 40(1):185–197
Charte F, Rivera AJ, Del Jesus MJ, Herrera F (2015) Mlsmote: Approaching imbalanced multilabel learning through synthetic instance generation. Knowl-Based Syst 89:385–397
Charte F, Rivera AJ, Del Jesus MJ, Herrera H (2015) Addressing imbalance in multilabel classification: Measures and random resampling algorithms. Neurocomputing 163:3–16
Yu H, Sun C, Yang X, Zheng S, Qi W, Xi X (2018) Lw-elm : a fast and flexible cost-sensitive learning framework for classifying imbalanced data. IEEE Access 6:28488–28500
Ma Z, Nie F, Yang Y, Uijlings J, Sebe N (2012) Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans Multimed 14(4):1021–1030
Nie F, Xu D, Tsang WH, Zhang C (2010b) Flexible manifold embedding: A framework for semi-supervised and unsupervised dimension reduction. IEEE Trans Image Process 19(7): 1921–1932
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(1):2399–2434
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: A java library for multi-label learning. J Mach Learn Res 12(7):2411–2414
Wu XZ, Zhou ZH (2017) A unified view of multi-label performance measures. In: 34th international conference on machine learning, pp 3780–3788
Hajmohammadi MS, Ibrahim R, Selamat A, Fujita H (2015) Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Inf Sci 317:67–77
Xiao Q, Dai J, Luo J, Fujita H (2019) Multi-view manifold regularized learning-based method for prioritizing candidate disease miRNAs. Knowl-Based Syst 175:118–129
Wanga H, Yang Y, Liu B, Fujita H (2019) A study of graph-based system for multi-view clusterin. Knowl-Based Syst 163:1009–1019
Zhang Y, Yang Y, Li T, Fujita H (2019) A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE. Knowl-Based Syst 163:776–786
Zhang H, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: An open-source software for multi-class imbalance learning. Knowl-Based Syst 174:137– 143
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Nos. 61976182, 62076171, 61876157), Key program for International S&T Cooperation of Sichuan Province (2019YFH0097), Sichuan Key R&D project (2020YFG0035).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lu, H., Chen, H., Li, T. et al. Multi-label feature selection based on manifold regularization and imbalance ratio. Appl Intell 52, 11652–11671 (2022). https://doi.org/10.1007/s10489-021-03141-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-03141-4