Feature selection for multi-label classification by maximizing full-dimensional conditional mutual information

Sha, Zhi-Chao; Liu, Zhang-Meng; Ma, Chen; Chen, Jun

doi:10.1007/s10489-020-01822-0

Feature selection for multi-label classification by maximizing full-dimensional conditional mutual information

Published: 12 August 2020

Volume 51, pages 326–340, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhi-Chao Sha ORCID: orcid.org/0000-0002-4551-2849¹,
Zhang-Meng Liu¹,
Chen Ma² &
…
Jun Chen¹

757 Accesses
15 Citations
Explore all metrics

Abstract

Conditional mutual information (CMI) maximization is a promising criterion for feature selection in a computationally efficient stepwise way, but it is hard to be applied comprehensively because of imprecise probability calculation and heavy computational load. Many dimension-reduced CMI-based and mutual information (MI)-based methods have been reported to achieve state-of-art performances in terms of classification. However, model deviations are introduced into the CMI and MI formulations in these methods during dimension reduction. In this paper, we start with the full-dimensional CMI to deal with the feature selection problem, so as to retain full inter-feature and feature-label mutual information when selecting new features. The cost function is approximated and simplified from a mathematical perspective to overcome the difficulties for maximizing the original full-dimensional CMI. A relationship is established between the proposed feature selection criterion and the one based on Hilbert-Schmidt independence, which explains qualitatively how the new criterion succeeds to achieve relevance maximization and redundance minimization simultaneously. Experiments on real-world datasets demonstrate the predominance of the proposed method over the existing ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature relevance term variation for multi-label feature selection

Article 07 January 2021

Ping Zhang & Wanfu Gao

A Fast Feature Selection Method Based on Mutual Information in Multi-label Learning

Conditional mutual information-based feature selection algorithm for maximal relevance minimal redundancy

Article 21 May 2021

Xiangyuan Gu, Jichang Guo, … Chongyi Li

Notes

In this paper, we consider the multi-label problem to avoid loss of generality. The single-label problem is a simplification of it by setting K = 1 and Y = y.
A multiplier of K is added to weight the inter-feature mutual information of independent labels, as K-label models are considered in this paper instead of the single-label one in [27].
Available at: http://pubchem.ncbi.nlm.nih.gov
Assay IDs include: 1416 (PERK), 1446 (JAK2), 1481 (ATPase), 1531 (MEK)
More detailed descriptions of the two datasets can be found in [16].

References

Bache K, Lichman M (2013) Uci machine learning repository
Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532
Article Google Scholar
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1):245–271
Article MathSciNet Google Scholar
Brown G, Pocock A, Zhao MJ, Luján M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13:27–66
MathSciNet MATH Google Scholar
Bu Z, Li HJ, Zhang C, Cao J, Li A, Shi Y Graph k-means based on leader identification, dynamic game and opinion dynamics, pp 1–1. https://doi.org/10.1109/TKDE.2019.2903712
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Article Google Scholar
Chen Y, Bi J, Wang J (2006) Miles: Multiple-instance learning via embedded instance selection. IEEE Trans Pattern Anal Mach Intell 28(12):1931–1947
Article Google Scholar
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
MathSciNet MATH Google Scholar
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with hilbert-schmidt norms. In: International conference on algorithmic learning theory. Springer, pp 63–77
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Article Google Scholar
Janecek A, Gansterer WN, Demel M, Ecker G (2008) On the relationship between feature selection and classification accuracy. FSDM 4:90–105
Google Scholar
Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116
Article Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2):273–324
Article Google Scholar
Koller D, Sahami M (1996) Toward optimal feature selection. Technical report, Stanford InfoLab
Kong X, Philip SY (2010) Multi-label feature selection for graph classification. In: 2010 IEEE 10th international conference on Data mining (ICDM). IEEE, pp 274–283
Kwak N, Choi CH (2002) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159
Article Google Scholar
Li HJ, Bu Z, Wang Z, Cao J (2020) Dynamical clustering in electronic commerce systems via optimization and leadership expansion. IEEE, pp 5327–5334
Liu H, Motoda H, Setiono R, Zhao Z (2010) Feature selection: an ever evolving frontier in data mining. In: Feature selection in data mining, pp 4–13
Liu H, Sun J, Liu L, Zhang H (2009) Feature selection with dynamic mutual information. Pattern Recogn 42(7):1330–1339
Article Google Scholar
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Article MathSciNet Google Scholar
Makoto Y, Jitkrittum W, Sigal L, Xing EP, Sugiyama M (2014) High-dimensional feature selection by feature-wise kernelized lasso. MIT, pp 185–207
Mitra P, Murthy C, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312
Article Google Scholar
Nakariyakul S, Casasent DP (2009) An improvement on floating search algorithms for feature subset selection. Pattern Recogn 42(9):1932–1940
Article Google Scholar
Neumann J, Schnörr C, Steidl G (2005) Combined svm-based feature selection and classification. Mach Learn 61(1):129–150
Article Google Scholar
Pappu V, Pardalos PM (2014) High-dimensional data classification. In: Clusters, orders, and trees: Methods and applications. Springer, pp 119–150
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15(11):1119–1125
Article Google Scholar
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article Google Scholar
Song L, Smola A, Gretton A, Bedo J, Borgwardt K (2012) Feature selection via dependence maximization. J Mach Learn Res 13:1393–1434
MathSciNet MATH Google Scholar
Sugiyama M (2012) Machine learning with squared-loss mutual information. Entropy 15(1):80–112
Article MathSciNet Google Scholar
Suzuki T, Sugiyama M, Kanamori T, Sese J (2009) Mutual information estimation reveals global associations between stimuli and biological processes. BMC Bioinform 10(1):S52
Article Google Scholar
Torkkola K (2003) Feature extraction by non-parametric mutual information maximization. J Mach Learn Res 3:1415–1438
MathSciNet MATH Google Scholar
Tu CJ, Chuang LY, Chang JY, Yang CH et al (2007) Feature selection using pso-svm. International Journal of Computer Science
Unler A, Murat A, Chinnam RB (2011) mr2pso: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Elsevier, pp 4625–4641
Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Computi Appl 24(1):175–186
Article Google Scholar
Wang J, Wei JM, Yang Z, Wang SQ (2017) Feature selection by maximizing independent classification information. IEEE Trans Knowl Data Eng 29(4):828–841
Article Google Scholar
Wang T, Lu J, Zhang G (2018) Two-stage fuzzy multiple kernel learning based on hilbert-schmidt independence criterion. IEEE, pp 1–1
Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by leap search. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, pp 433–444
Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: 2002. ICDM 2003. Proceedings. 2002 IEEE international conference on Data mining. IEEE, pp 721–724
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: ICML, vol 3, pp 856–863
Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Article Google Scholar
Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Article Google Scholar
Zhang Y, Zhou ZH (2010) Multilabel dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data (TKDD) 4(3):14
Google Scholar
Zhou Y, Jin R, Hoi SC (2010) Exclusive lasso for multi-task feature selection. In: AISTATS, vol 9, pp 988–995

Download references

Author information

Authors and Affiliations

College of Electronic Science and Technology, National University of Defense Technology, Changsha, 410073, China
Zhi-Chao Sha, Zhang-Meng Liu & Jun Chen
State Key Laboratory of Astronautic Dynamics, Xi’an, 710000, China
Chen Ma

Authors

Zhi-Chao Sha
View author publications
You can also search for this author in PubMed Google Scholar
Zhang-Meng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chen Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jun Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi-Chao Sha.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sha, ZC., Liu, ZM., Ma, C. et al. Feature selection for multi-label classification by maximizing full-dimensional conditional mutual information. Appl Intell 51, 326–340 (2021). https://doi.org/10.1007/s10489-020-01822-0

Download citation

Published: 12 August 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10489-020-01822-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection for multi-label classification by maximizing full-dimensional conditional mutual information

Abstract

Access this article

Similar content being viewed by others

Feature relevance term variation for multi-label feature selection

A Fast Feature Selection Method Based on Mutual Information in Multi-label Learning

Conditional mutual information-based feature selection algorithm for maximal relevance minimal redundancy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Feature relevance term variation for multi-label feature selection

A Fast Feature Selection Method Based on Mutual Information in Multi-label Learning

Conditional mutual information-based feature selection algorithm for maximal relevance minimal redundancy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation