Abstract
Multi-label classification has gained extensive attention recently. Compared with traditional classification, multi-label classification allows one instance to associate with multiple labels. The curse of dimensionality existing in multi-label data presents a challenge to the performance of multi-label classifiers. Multi-label feature selection is a powerful tool for high-dimension problem. However, the existing feature selection methods are unable to take both computational complexity and label correlation into consideration. To address this problem, a new approach based on information gain for multi-label feather selection (IGMF) is presented in this paper. In the process of IGMF, Information gain between a feature and label set is exploited to measure the importance of the feature and label corrections. After that, the optimal feature subset are obtained by setting the threshold value. A series of experimental results show that IGMF can promote performance of multi-label classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. Advances in Neural Information Processing Systems, pp. 681–687 (2001)
Srivastava, A.N., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: Aerospace Conference, pp. 3853–3862. IEEE (2005)
Turnbull, D., Barrington, L., Torres, D., et al.: Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech, and Language Processing 16(2), 467–476 (2008)
Spyromitros, E., Tsoumakas, G., Vlahavas, I.P.: An empirical study of lazy multilabel classification algorithms. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds.) SETN 2008. LNCS (LNAI), vol. 5138, pp. 401–406. Springer, Heidelberg (2008)
Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Machine Learning 39, 135–168 (2000)
Cheng, W., Hullermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Machine Learning 76, 211–225 (2009)
Tsoumakas, G., Dimou, A., Spyromitros, E., Mezaris, V., Kompatsiaris, I., Vlahavas, I.: Correlation-based pruning of stacked binary relevance models for multi-label learning. In: Proceedings of the Workshop on Learning from Multi-Label Data (MLD 2009), pp. 101–116. Springer Press, Berlin (2009)
Liu, H., Motoda, H., Setiono, R., et al.: Feature Selection: An Ever Evolving Frontier in Data Mining. FSDM, 4–13 (2010)
Jolliffe, I.: Principal Component Analysis. Springer-Verlag, New York (1986)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn. Spring (2010)
Zhang, Y., Zhou, Z.H.: Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data (TKDD) 4(3), 14 (2010)
Fisher, R.: The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179–188 (1936)
Spolaor, N., Cherman, E.A., Monard, M.C.: Using ReliefF for Multilabel feature selection. In: Conferencia Latinoamericana de Informatica, pp. 960–975 (2011)
Lee, J., Kim, D.W.: Feature selection for multi-label classification using multivariate mutual information. Pattern Recognition Letters 34(3), 349–357 (2013)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)
Zhang, Y., You, L., Chen, J.X.: Feature selection for multi-label data by using simulated annealing. Computer Engineering and Design 32(7), 2494–2500 (2011)
You, M., Liu, J., Li, G.Z., et al.: Embedded feature selection for multi-label classification of music emotions. International Journal of Computational Intelligence Systems 5(4), 668–678 (2012)
Shao, H., Li, G.Z., Liu, G.P., et al.: Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine. Science China Information Sciences 56(5), 1–13 (2013)
Qu, H., Zhang, S., Liu, H., et al.: A multi-label classification algorithm based on label-specific features. Wuhan University Journal of Natural Sciences 16(6), 520–524 (2011)
Kong, D., Ding, C., Huang, H., et al.: Multi-label relieff and f-statistic feature selections for image annotation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2352–2359. IEEE (2012)
Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley and Sons (2012)
Brown, G.: A new perspective for information theoretic feature selection. International Conference on Artificial Intelligence and Statistics, 49–56 (2009)
Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I.: Multi-label classification of music into emotions. In: 9th International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia, pp. 325–330 (2008)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on. Knowledge and Data Engineering 17(4), 491–502 (2005)
Zhang, M.L., Pena, J.M., Robles, V.: Feature selection for multi-label naive Bayes classification. Information Sciences 179(19), 3218–3229 (2009)
Pudil, P., Novovicov, J., Kittler, J., et al.: Floating search methods in feature selection. Pattern recognition letters 15(11), 1119–1125 (1994)
Ronen, M., Jacob, Z.: Using simulated annealing to optimize feature selection problem in marketing applications. European Journal of Operational Research 171(3), 842–858 (2006)
Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm, Feature extraction. Construction and Selection, pp. 117–136. Springer, US (1998)
Zhang, M.-L., Zhou, Z.-H.: ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognition 40(7), 2038–2048 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, L. et al. (2014). Multi-label Feature Selection via Information Gain. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-14717-8_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14716-1
Online ISBN: 978-3-319-14717-8
eBook Packages: Computer ScienceComputer Science (R0)