Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement

Long, Xuandong; Qian, Wenbin; Wang, Yinglong; Shu, Wenhao

doi:10.1007/s10489-020-01993-w

Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement

Published: 30 October 2020

Volume 51, pages 2210–2232, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xuandong Long¹,
Wenbin Qian^1,2,
Yinglong Wang¹ &
…
Wenhao Shu³

609 Accesses
13 Citations
Explore all metrics

Abstract

Multi-label feature selection, which is an efficient and effective pre-processing step in machine learning and data mining, can select a feature subset that contains more contributions for multi-label classification while improving the performance of the classifiers. In real-world applications, an instance may be associated with multiple related labels with different relative importances, and the process of obtaining different features usually requires different costs, containing money, and time, etc. However, most existing works with regard to multi-label feature selection do not take into consideration the above two critical issues simultaneously. Therefore, in this paper, we exploit the idea of neighborhood granularity to enhance the traditional logical labels into label distribution forms to excavate the deeper supervised information hidden in multi-label data, and further consider the effect of the test cost under three different distributions, simultaneously. Motivated by these issues, a novel test cost multi-label feature selection algorithm with label enhancement and neighborhood granularity is designed. Moreover, the proposed algorithm is tested upon ten publicly available benchmark multi-label datasets with six widely-used metrics from two different aspects. Finally, two groups of experimental results demonstrate that the proposed algorithm achieves the satisfactory and superior performance over other four state-of-the-art comparing algorithms, and it is effective for improving the learning performance and decreasing the total test costs of the selected feature subset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Simple and Convex Formulation for Multi-label Feature Selection

Multi-label feature selection method based on dynamic weight

Article 30 January 2022

A novel feature selection approach with Pareto optimality for multi-label data

Article 17 March 2021

References

Wu Q, Tan M, Song H, Chen J, Ng M (2016) ML-FOREST: A multi-label tree ensemble method for multi-label classification. IEEE Trans Knowl Data Eng 28:2665–2680
Google Scholar
He Z, Yang M (2016) Sparse and low-rank representation for multi-label classification. Appl Intell 49:1708–1723
Google Scholar
Ding M, Yang Y, Lan Z (2018) Multi-label imbalanced classification based on assessments of cost and value. Appl Intell 48:3577–3590
Google Scholar
Yan Z, Liu W, Wen S (2019) Multi-label image classification by feature attention network. IEEE Access 7:98005–98013
Google Scholar
Yu W, Chen Z, Luo X, Liu W, Xu W (2019) DELTA: A deep dual-stream network for multi-label image classification. Pattern Recogn 91:322–331
Google Scholar
Lyu F, Wu Q, Hu F, Wu Q, Tan M (2019) Attend and imagine: Multi-label image classification with visual attention and recurrent neural networks. IEEE Trans Multimed 21:1971–1981
Google Scholar
Peng H, Li J, Wang S, He L, Li B, Wang L, Yu P (2019) Hierarchical taxonomy-aware and attentional graph capsule RCNNs for large-scale multi-label text classification. IEEE Trans Knowl Data Eng, pp 1–1
Elghazel H, Aussem A, Gharroudi O, Saadaoui W (2016) Ensemble multi-label text categorization based on rotation forest and latent semantic indexing. Expert Syst Appl 57:1–11
Google Scholar
Jiang M, Li N, Pan Z (2017) Multi-label text categorization using L21-norm minimization extreme learning machine. Neurocomputing 261:4–10
Google Scholar
Barutcuoglu Z, Schapire R, Troyanskaya O (2006) Hierarchical multi-label prediction of gene function. Bioinformatics 22:830–836
Google Scholar
Liu L, Tang L, Jin X, Zhou W (2019) A multi-label supervised topic model conditioned on arbitrary features for gene function prediction. Genes 10:57
Google Scholar
Cerri R, Barros R, de Carvalho A, Jin Y (2016) Reduction strategies for hierarchical multi-label classification in protein function prediction. BMC bioinformatics 17:373
Google Scholar
Liu K, Yang X, Fujita H, Liu D, Yang X, Qian Y (2019) An efficient selector for multi-granularity attribute reduction. Inform Sci 505:457–472
Google Scholar
Chen Y, Liu K, Song J, Fujita H, Yang X, Qian Y (2020) Attribute group for attribute reduction. Inform Sci 535:64–80
Google Scholar
Jing Y, Li T, Fujita H, Yu Z, Wang B (2017) An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view. Inform Sci 411:23–38
MathSciNet Google Scholar
Spolaôr N, Monard M C, Tsoumakas G, Lee H (2016) A systematic review of multi-label feature selection and a new method based on label construction. Neurocomputing 180:3–15
Google Scholar
Wang C, Lin Y, Liu J (2019) Feature selection for multi-label learning with missing labels. Appl Intell 49:3027–3042
Google Scholar
Gao W, Hu L, Zhang P (2020) Feature redundancy term variation for mutual information-based feature selection. Appl Intell 50:1–17
Google Scholar
Jiang Z, Liu K, Yang X, Yu H, Fujita H, Qian Y (2020) Accelerator for supervised neighborhood based attribute reduction. Int J Approx Reason 119:122–150
MathSciNet MATH Google Scholar
Zhang Y, Li H, Wang Q, Peng C (2019) A filter-based bare-bone particle swarm optimization algorithm for unsupervised feature selection. Appl Intell 49:2889–2898
Google Scholar
Lee J, Kim D (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recogn Lett 34:349–357
Google Scholar
Lin Y, Hu Q, Liu J, Duan J (2015) Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168:92–103
Google Scholar
Lee J, Kim D (2015) Mutual information-based multi-label feature selection using interaction information. Expert Systems With Applications 42:2013–2025
Google Scholar
Reyes O, Morell C, Ventura S (2015) Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context. Neurocomputing 161:168–182
Google Scholar
Lin Y, Hu Q, Liu J, Chen J, Duan J (2016) Multi-label feature selection based on neighborhood mutual information. Appl Soft Comput 38:244–256
Google Scholar
Li F, Miao D, Pedrycz W (2017) Granular multi-label feature selection based on mutual information. Pattern Recogn 67:410–423
Google Scholar
Kashef S, Nezamabadi-pour H (2019) A label-specific multi-label feature selection algorithm based on the Pareto dominance concept. Pattern Recogn 88:654–667
Google Scholar
Paniri M, Dowlatshahi M, Nezamabadi-pour H (2019) MLACO: A multi-label feature selection algorithm based on ant colony optimization. Knowledge Based Systems 192:105285
Google Scholar
Hashemi A, Dowlatshahi M, Nezamabadi-pour H (2020) MGFS: A multi-label graph-based feature selection algorithm via PageRank centrality. Expert Systems with Applications 113024:142
Google Scholar
Zhang P, Liu G, Gao W (2019) Distinguishing two types of labels for multi-label feature selection. Pattern Recogn 95:72–82
Google Scholar
Sun Z, Zhang J, Dai L, Li C, Zhou C, Xin J, Li S (2019) Mutual information based multi-label feature selection via constrained convex optimization. Neurocomputing 329:447–456
Google Scholar
Gonzalezlopez J, Ventura S, Cano A (2020) Distributed multi-label feature selection using individual mutual information measures. Knowledge Based Systems 188:105052
Google Scholar
Che X, Chen D, Mi J (2020) A novel approach for learning label correlation with application to feature selection of multi-label data. Inform Sci 512:795–812
MathSciNet MATH Google Scholar
Lim H, Kim D (2019) MFC: Initialization method for multi-label feature selection based on conditional mutual information. Neurocomputing 382:40–51
Google Scholar
Lin Y, Li Y, Wang C, Chen J (2018) Attribute reduction for multi-label learning with fuzzy rough set. Knowl-Based Syst 152:51–61
Google Scholar
Geng X, Yin C, Zhou Z H (2013) Facial Age Estimation by Learning from Label Distributions. IEEE Trans Pattern Anal Mach Intell 35:2401–2412
Google Scholar
Zheng H, Geng X, Tao D, Jin Z (2016) A multi-task model for simultaneous face identification and facial expression recognition. Neurocomputing 171:515–523
Google Scholar
Geng X (2016) Label distribution learning. IEEE Trans Knowl Data Eng 28:1734–1748
Google Scholar
Xu N, Liu Y P, Geng X (2019) Label enhancement for label distribution learning. IEEE Trans Knowl Data Eng 1–1
Min F, He H, Qian Y, Zhu W (2011) Test-cost-sensitive attribute reduction. Inform Sci 181:4928–4942
Google Scholar
Zhao H, Min F, Zhu W (2013) Test-cost-sensitive attribute reduction of data with normal distribution measurement errors. Math Probl Eng 1–12
Min F, Zhu W (2012) Attribute reduction of data with error ranges and test costs. Inform Sci 211:48–67
MathSciNet MATH Google Scholar
Yang X, Li T, Liu D, Fujita H (2019) A temporal-spatial composite sequential approach of three-way granular computing. Inform Sci 486:171–189
Google Scholar
Fujita H, Gaeta A, Loia V, Orciuoli F (2019) Resilience Analysis of Critical Infrastructures: A Cognitive Approach Based on Granular Computing. IEEE Trans Sys Man Cybern 49:1835–1848
Google Scholar
Yang X, Zhang Y, Fujita H, Liu D, Li T (2020) Local temporal-spatial multi-granularity learning for sequential three-way granular computing. Inform Sci 541:75–97
MathSciNet Google Scholar
Hu Q, Zhang L, Zhang D, Pan W, An S, Pedrycz W (2011) Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Systems with Applications 38:10737–10750
Google Scholar
Hu Q, Yu D, Liu J, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inform Sci 178:3577–3594
MathSciNet MATH Google Scholar
Hu Q, Yu D, Xie Z (2008) Neighborhood classifiers. Expert Systems with Applications 34:866–876
Google Scholar
Liu J, Lin Y, Li Y, Weng W, Wu S (2018) Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recogn 84:273–287
Google Scholar
Zhang B, Min F, Ciucci D (2015) Representative-based classification through covering-based neighborhood rough sets. Appl Intell 43:840–854
Google Scholar
Liu Y, Xie H, Chen Y, Tan K, Wang L, Xie W (2016) Neighborhood mutual information and its application on hyperspectral band selection for classification. Chemom Intell Lab Syst 157:140–151
Google Scholar
Zhang Y, Zhou Z (2010) Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data 4:1–21
Google Scholar
Mulan, http://mulan.sourceforge.net/datasets.html
MLL Resources, http://www.uco.es/kdis/mllresources
Kashef S, Nezamabadi-pour H, Nikpour B (2018) Multilabel feature selection: A comprehensive review and guiding experiments. Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery 8:e1240
Google Scholar
Zhang M, Zhou Z (2007) ML-KNN: A lazy learning approach to multi-label learning. Pattern Recogn 40:2038–2048
MATH Google Scholar
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11:86–92
MathSciNet MATH Google Scholar
Dunn O (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64
MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No.61966016 and 61662023), the Natural Science Foundation of Jiangxi Province (No.20192BAB207018), and the Scientific Research Project of Education department of Jiangxi Province (No. GJJ180200).

Author information

Authors and Affiliations

School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 330045, China
Xuandong Long, Wenbin Qian & Yinglong Wang
School of Software, Jiangxi Agricultural University, Nanchang, 330045, China
Wenbin Qian
School of Information Engineering, East China Jiaotong University, Nanchang, 30013, China
Wenhao Shu

Authors

Xuandong Long
View author publications
You can also search for this author in PubMed Google Scholar
Wenbin Qian
View author publications
You can also search for this author in PubMed Google Scholar
Yinglong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenhao Shu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenbin Qian.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Long, X., Qian, W., Wang, Y. et al. Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement. Appl Intell 51, 2210–2232 (2021). https://doi.org/10.1007/s10489-020-01993-w

Download citation

Accepted: 01 October 2020
Published: 30 October 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s10489-020-01993-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement

Abstract

Access this article

Similar content being viewed by others

A Simple and Convex Formulation for Multi-label Feature Selection

Multi-label feature selection method based on dynamic weight

A novel feature selection approach with Pareto optimality for multi-label data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement

Abstract

Access this article

Similar content being viewed by others

A Simple and Convex Formulation for Multi-label Feature Selection

Multi-label feature selection method based on dynamic weight

A novel feature selection approach with Pareto optimality for multi-label data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation