Skip to main content

Multi-label Feature Selection via Information Gain

  • Conference paper
Advanced Data Mining and Applications (ADMA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8933))

Included in the following conference series:

Abstract

Multi-label classification has gained extensive attention recently. Compared with traditional classification, multi-label classification allows one instance to associate with multiple labels. The curse of dimensionality existing in multi-label data presents a challenge to the performance of multi-label classifiers. Multi-label feature selection is a powerful tool for high-dimension problem. However, the existing feature selection methods are unable to take both computational complexity and label correlation into consideration. To address this problem, a new approach based on information gain for multi-label feather selection (IGMF) is presented in this paper. In the process of IGMF, Information gain between a feature and label set is exploited to measure the importance of the feature and label corrections. After that, the optimal feature subset are obtained by setting the threshold value. A series of experimental results show that IGMF can promote performance of multi-label classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. Advances in Neural Information Processing Systems, pp. 681–687 (2001)

    Google Scholar 

  2. Srivastava, A.N., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: Aerospace Conference, pp. 3853–3862. IEEE (2005)

    Google Scholar 

  3. Turnbull, D., Barrington, L., Torres, D., et al.: Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech, and Language Processing 16(2), 467–476 (2008)

    Article  Google Scholar 

  4. Spyromitros, E., Tsoumakas, G., Vlahavas, I.P.: An empirical study of lazy multilabel classification algorithms. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds.) SETN 2008. LNCS (LNAI), vol. 5138, pp. 401–406. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization. Machine Learning 39, 135–168 (2000)

    Article  MATH  Google Scholar 

  6. Cheng, W., Hullermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Machine Learning 76, 211–225 (2009)

    Article  Google Scholar 

  7. Tsoumakas, G., Dimou, A., Spyromitros, E., Mezaris, V., Kompatsiaris, I., Vlahavas, I.: Correlation-based pruning of stacked binary relevance models for multi-label learning. In: Proceedings of the Workshop on Learning from Multi-Label Data (MLD 2009), pp. 101–116. Springer Press, Berlin (2009)

    Google Scholar 

  8. Liu, H., Motoda, H., Setiono, R., et al.: Feature Selection: An Ever Evolving Frontier in Data Mining. FSDM, 4–13 (2010)

    Google Scholar 

  9. Jolliffe, I.: Principal Component Analysis. Springer-Verlag, New York (1986)

    Book  Google Scholar 

  10. Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn. Spring (2010)

    Google Scholar 

  11. Zhang, Y., Zhou, Z.H.: Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data (TKDD) 4(3), 14 (2010)

    Article  Google Scholar 

  12. Fisher, R.: The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179–188 (1936)

    Article  Google Scholar 

  13. Spolaor, N., Cherman, E.A., Monard, M.C.: Using ReliefF for Multilabel feature selection. In: Conferencia Latinoamericana de Informatica, pp. 960–975 (2011)

    Google Scholar 

  14. Lee, J., Kim, D.W.: Feature selection for multi-label classification using multivariate mutual information. Pattern Recognition Letters 34(3), 349–357 (2013)

    Article  Google Scholar 

  15. Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)

    Article  Google Scholar 

  16. Zhang, Y., You, L., Chen, J.X.: Feature selection for multi-label data by using simulated annealing. Computer Engineering and Design 32(7), 2494–2500 (2011)

    Google Scholar 

  17. You, M., Liu, J., Li, G.Z., et al.: Embedded feature selection for multi-label classification of music emotions. International Journal of Computational Intelligence Systems 5(4), 668–678 (2012)

    Article  MathSciNet  Google Scholar 

  18. Shao, H., Li, G.Z., Liu, G.P., et al.: Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine. Science China Information Sciences 56(5), 1–13 (2013)

    Article  MathSciNet  Google Scholar 

  19. Qu, H., Zhang, S., Liu, H., et al.: A multi-label classification algorithm based on label-specific features. Wuhan University Journal of Natural Sciences 16(6), 520–524 (2011)

    Article  Google Scholar 

  20. Kong, D., Ding, C., Huang, H., et al.: Multi-label relieff and f-statistic feature selections for image annotation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2352–2359. IEEE (2012)

    Google Scholar 

  21. Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley and Sons (2012)

    Google Scholar 

  22. Brown, G.: A new perspective for information theoretic feature selection. International Conference on Artificial Intelligence and Statistics, 49–56 (2009)

    Google Scholar 

  23. Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I.: Multi-label classification of music into emotions. In: 9th International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia, pp. 325–330 (2008)

    Google Scholar 

  24. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on. Knowledge and Data Engineering 17(4), 491–502 (2005)

    Article  Google Scholar 

  25. Zhang, M.L., Pena, J.M., Robles, V.: Feature selection for multi-label naive Bayes classification. Information Sciences 179(19), 3218–3229 (2009)

    Article  MATH  Google Scholar 

  26. Pudil, P., Novovicov, J., Kittler, J., et al.: Floating search methods in feature selection. Pattern recognition letters 15(11), 1119–1125 (1994)

    Article  Google Scholar 

  27. Ronen, M., Jacob, Z.: Using simulated annealing to optimize feature selection problem in marketing applications. European Journal of Operational Research 171(3), 842–858 (2006)

    Article  MATH  Google Scholar 

  28. Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm, Feature extraction. Construction and Selection, pp. 117–136. Springer, US (1998)

    Book  Google Scholar 

  29. Zhang, M.-L., Zhou, Z.-H.: ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognition 40(7), 2038–2048 (2007)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Li, L. et al. (2014). Multi-label Feature Selection via Information Gain. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14717-8_27

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14716-1

  • Online ISBN: 978-3-319-14717-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics