Skip to main content
Log in

Multi-label feature selection based on manifold regularization and imbalance ratio

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The dimension of data in the domain of multi-label learning is usually high, which makes the calculation cost very high. As an important data dimension reduction technology, feature selection has attracted the attention of many researchers. And the imbalance of data labels is also one of the factors that perplex multi-label learning. To tackle these problems, we propose a new multi-label feature selection algorithm named IMRFS, which combines manifold learning and label imbalance. Firstly, in order to keep the manifold structure between samples, the Laplacian graph is used to construct the manifold regularization. In addition, the local manifold structure of each label is considered to find the correlation between labels. And the imbalance distribution of labels is also considered, which is embedded into the manifold structure of labels. Furthermore, in order to ensure the robustness and sparsity of the IMRFS method, the L2,1-norm is applied to loss function and sparse regularization term simultaneously. Then, we adopt an iterative strategy to optimize the objective function of IMRFS. Finally, comparison results on multiple datasets show the effectiveness of IMRFS method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  1. Ling J, Li J, Kai S, Liu H (2016) Multi-label informed feature selection. In: International joint conference on artificial intelligence, pp 1627–1633

  2. Lee J, Kim D (2015) Fast multi-label feature selection based on information-theoretic feature ranking. Pattern Recognit J Pattern Recognit Soc 48(9):2671–2771

    MATH  Google Scholar 

  3. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017b) Feature selection: a data perspective. Acm Comput Surv 50(6):Article 94

  4. Yun L, Tao L, Liu H (2017) Recent advances in feature selection and its applications. Knowl Inf Syst 53(3):1–27

    Google Scholar 

  5. Tibshirani R, Friedman JH (2001) The elements of statistical learning. Journal of the Royal Statistical Society. Springer, New York

    Google Scholar 

  6. Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinform 2015:1–13

    Article  Google Scholar 

  7. Lan G, Hou C, Nie F, Luo T, Yi D (2018) Robust feature selection via simultaneous sapped norm and sparse regularizer minimization. Neurocomputing 283:228–240

    Article  Google Scholar 

  8. Gui J, Sun Z, Ji S, Tao D, Tan T (2016) Feature selection based on structured sparsity: a comprehensive study. IEEE Trans Neural Netw Learn Syst 28(7):1490–1507

    Article  MathSciNet  Google Scholar 

  9. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  10. Zhang M, Zhou Z (2014) A review on multi-label learnin algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837

    Article  Google Scholar 

  11. Gao W, Hu L, Zhang P (2018) Class-specific mutual information variation for feature selection. Pattern Recogn 79:328–339

    Article  Google Scholar 

  12. Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453

    Article  Google Scholar 

  13. Huang J, Li G, Huang Q, Wu X (2017) Joint feature selection and classification for multilabel learning. IEEE Trans Cybern 48(3):876–889

    Article  Google Scholar 

  14. Nie F, Huang H, Xiao C, Ding C (2010a) Efficient and robust feature selection via joint l2, 1-norms minimization. In: International conference on neural information processing systems, vol 2, pp 1813–1821

  15. Liu J, Lin Y, Lin M, Wu S, Zhang J (2016) Feature selection based on quality of information. Neurocomputing 225:11–22

    Article  Google Scholar 

  16. Li F, Miao D, Pedrycz W (2017a) Granular multi-label feature selection based on mutual information. Pattern Recogn 67:410–423

    Article  Google Scholar 

  17. Zhang J, Li C, Cao D, Lin Y, Su S, Liang D, Li S (2018b) Multi-label learning with label-specific features by resolving label correlations. Knowl-Based Syst 159:148–157

    Article  Google Scholar 

  18. Boutell MR, Lou J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757– 1771

    Article  Google Scholar 

  19. Zhang ML, Zhou ZH (2007) MK-KNN: A lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048

    Article  MATH  Google Scholar 

  20. Qi G, Hua X, Rui Y, Tang J (2007) Correlative multilabel video annotation. In: 15th ACM international conference on multimedia, pp 17–26

  21. Brinker K, Mencia EL, Fuernkranz J, Huellermeier E (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–152

    Article  MATH  Google Scholar 

  22. Xie S, Kong X, Gao J, Fan W (2013) Multilabel consensus classification. In: 13th IEEE international conference on data mining, pp 1241–C1246

  23. Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol 14, pp 681–687

  24. Huang J, Li G, Huang Q, Wu X (2016) Learning label-specific features and class-dependent labels for multi-label classification. IEEE Trans Knowl Data Eng 28(12):3309–3323

    Article  Google Scholar 

  25. Tang C, Bian M, Liu X, Li M, Yin H (2019) Unsupervised feature selection via latent representation learning and manifold regularization. Neural Netw 117:163–178

    Article  Google Scholar 

  26. Zhu Y, Kwok JT, Zhou ZH (2017) Multi-label learning with global and local label correlation. IEEE Trans Knowl Data Eng 30(6):1081–1094

    Article  Google Scholar 

  27. Cheng K, Gao S, Dong W, Yang X, Yu H (2020) Boosting label weighted extreme learning machine for classifying multi-label imbalanced data. Neurocomputing 403:360–370

    Article  Google Scholar 

  28. Piri S, Delen D, Liu T (2018) A synthetic informative minority over-sampling(SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets. Decis Support Syst 106:15–29

    Article  Google Scholar 

  29. Kang Q, Chen XS, Li SS, Zhou MC (2017) A noise-filtered under-sampling scheme for imbalanced classification. IEEE Trans Cybern 47(12):4263–4274

    Article  Google Scholar 

  30. Zhang C, Tan KC, Li H, Hong GS (2018a) A cost-sensitive deep belief network for imbalanced classification. IEEE Trans Neural Netw Learn Syst 30(1):109–122

    Article  Google Scholar 

  31. Yu H, Sun X, Yang S, Zou H (2019) Fuzzy support vector machine with relative density information for classifying imbalanced data. IEEE Trans Fuzzy Syst 27(12):2353–2367

    Article  Google Scholar 

  32. Collell G, Prelec D, Patil KR (2018) A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data. Neurocomputing 275(31):330–340

    Article  Google Scholar 

  33. Yu H, Mu C, Sun C, Yang W, Yang X, Xin Z (2015) Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data. Knowl-Based Syst 76:67–78

    Article  Google Scholar 

  34. Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368

    Article  Google Scholar 

  35. Sun Z, Song Q, Zhu X, Sun H (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48(5):1623–1637

    Article  Google Scholar 

  36. Hu L, Li Y, Gao W, Zhang P, Hu J (2020b) Multi-label feature selection with shared common mode. Pattern Recogn 107344:104

    Google Scholar 

  37. Lin Y, Hu Q, Liu J, Duan J (2015) Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168:92–103

    Article  Google Scholar 

  38. Zhang P, Liu G, Gao W (2019b) Distinguishing two types of labels for multi-label feature selection. Pattern Recogn 95:72–82

    Article  Google Scholar 

  39. Kim DW, Lee J (2017) Scls: Multi-label feature selection based on scalable criterion for large label set. Pattern Recognit J Pattern Recognit Soc 66:342–352

    Article  MathSciNet  Google Scholar 

  40. Xiao C, Nie F, Huang H (2013) Exact top-k feature selection via L2,0-norm constraint. In: 23rd international joint conference on artificial intelligence, pp 1240–1246

  41. Huang R, Jiang W, Sun G (2018) Manifold-based constraint laplacian score for multi-label feature selection. Pattern Recogn Lett 112:346–352

    Article  Google Scholar 

  42. Cai Z, Zhu W (2018) Multi-label feature selection via feature manifold learning and sparsity regularization. Int J Mach Learn Cybern 9(8):1321–1334

    Article  Google Scholar 

  43. Zhang J, Luo Z, Li C, Zhou C, Li S (2019a) Manifold regularized discriminative feature selection for multi-label learning. Pattern Recogn 95:136–150

    Article  Google Scholar 

  44. Hu J, Li Y, Gao W, Zhang P (2020a) Robust multi-label feature selection with dual-graph regularization. Knowl-Based Syst 203:106–126

    Article  Google Scholar 

  45. Huang H, Nie F, Wang H (2011) Unsupervised and semi-supervised learning via L1-norm graph. In: 2011 IEEE International conference on computer vision (ICCV), vol 2011, pp 2268–2273

  46. Wang H, Yang Y, Liu B, Fujita H (2019) A study of graph-based system for multi-view clustering. Knowl-Based Syst 163:1009–1019

    Article  Google Scholar 

  47. Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for Class-Imbalance learning. IEEE Trans Syst Man Cybern B 39(2):539–550

    Article  Google Scholar 

  48. Sun Y, Wong K, Andrew KC, Kamel M (2009) Classification of imbalanced data: a review. Pattern Recognit Artif Intell 23(4):687–719

    Article  Google Scholar 

  49. Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBOost: A hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Humans 40(1):185–197

    Article  Google Scholar 

  50. Charte F, Rivera AJ, Del Jesus MJ, Herrera F (2015) Mlsmote: Approaching imbalanced multilabel learning through synthetic instance generation. Knowl-Based Syst 89:385–397

    Article  Google Scholar 

  51. Charte F, Rivera AJ, Del Jesus MJ, Herrera H (2015) Addressing imbalance in multilabel classification: Measures and random resampling algorithms. Neurocomputing 163:3–16

    Article  Google Scholar 

  52. Yu H, Sun C, Yang X, Zheng S, Qi W, Xi X (2018) Lw-elm : a fast and flexible cost-sensitive learning framework for classifying imbalanced data. IEEE Access 6:28488–28500

    Article  Google Scholar 

  53. Ma Z, Nie F, Yang Y, Uijlings J, Sebe N (2012) Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans Multimed 14(4):1021–1030

    Article  Google Scholar 

  54. Nie F, Xu D, Tsang WH, Zhang C (2010b) Flexible manifold embedding: A framework for semi-supervised and unsupervised dimension reduction. IEEE Trans Image Process 19(7): 1921–1932

    Article  MathSciNet  MATH  Google Scholar 

  55. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(1):2399–2434

    MathSciNet  MATH  Google Scholar 

  56. Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: A java library for multi-label learning. J Mach Learn Res 12(7):2411–2414

    MathSciNet  MATH  Google Scholar 

  57. Wu XZ, Zhou ZH (2017) A unified view of multi-label performance measures. In: 34th international conference on machine learning, pp 3780–3788

  58. Hajmohammadi MS, Ibrahim R, Selamat A, Fujita H (2015) Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Inf Sci 317:67–77

    Article  Google Scholar 

  59. Xiao Q, Dai J, Luo J, Fujita H (2019) Multi-view manifold regularized learning-based method for prioritizing candidate disease miRNAs. Knowl-Based Syst 175:118–129

    Article  Google Scholar 

  60. Wanga H, Yang Y, Liu B, Fujita H (2019) A study of graph-based system for multi-view clusterin. Knowl-Based Syst 163:1009–1019

    Article  Google Scholar 

  61. Zhang Y, Yang Y, Li T, Fujita H (2019) A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE. Knowl-Based Syst 163:776–786

    Article  Google Scholar 

  62. Zhang H, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-imbalance: An open-source software for multi-class imbalance learning. Knowl-Based Syst 174:137– 143

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61976182, 62076171, 61876157), Key program for International S&T Cooperation of Sichuan Province (2019YFH0097), Sichuan Key R&D project (2020YFG0035).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongmei Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, H., Chen, H., Li, T. et al. Multi-label feature selection based on manifold regularization and imbalance ratio. Appl Intell 52, 11652–11671 (2022). https://doi.org/10.1007/s10489-021-03141-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03141-4

Keywords

Navigation