Skip to main content
Log in

Soft-constrained Laplacian score for semi-supervised multi-label feature selection

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Feature selection, semi-supervised learning and multi-label classification are different challenges for machine learning and data mining communities. While other works have addressed each of these problems separately, in this paper we show how they can be addressed together. We propose a unified framework for semi-supervised multi-label feature selection, based on Laplacian score. In particular, we show how to constrain the function of this score, when data are partially labeled and each instance is associated with a set of labels. We transform the labeled part of data into soft constraints and show how to integrate them in a measure of feature relevance, according to the available labels. Experiments on benchmark data sets are provided for validating the proposed approach and comparing it with some other state-of-the-art feature selection methods in a multi-label context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Barutcuoglu Z, Schapire RE, Troyanskaya OG (2006) Hierarchical multi-label prediction of gene function. Bioinformatics 22(7):830–836

    Article  Google Scholar 

  2. Benabdeslem K, Hindawi M (2011) Constrained Laplacian score for semi-supervised feature selection. In: Proceedings of ECML-PKDD conference, pp 204–218

  3. Benabdeslem K, Hindawi M (2014) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143

    Article  Google Scholar 

  4. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771

    Article  Google Scholar 

  5. Briggs F, Lakshminarayanan B, Neal L, Fern XZ, Raich R, Hadley SJ, Hadley AS, Betts MG (2012) Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J Acoust Soc Am 131(6):4640–4650

    Article  Google Scholar 

  6. Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multi-label classification. Mach Learn 76(2–3):211–225

    Article  Google Scholar 

  7. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156

    Article  Google Scholar 

  8. Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45

    Article  MathSciNet  MATH  Google Scholar 

  9. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  10. Doquire G, Verleysen M (2013) Mutual information-based feature selection for multilabel classification. Neurocomputing 122:148–155

    Article  MATH  Google Scholar 

  11. Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889

    MathSciNet  MATH  Google Scholar 

  12. Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. Proc NIPS 14:681–687

    Google Scholar 

  13. García S, Fernández A, Luengo J, Herrera F (2010) Advanced non-parametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064

    Article  Google Scholar 

  14. García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, New York

    Book  Google Scholar 

  15. Gu Q, Li Z, Han J (2011) Correlated multi-label feature selection. In: Proceedings of the 20th ACM international conference on Information and knowledge management, ACM, pp 1087–1096

  16. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  17. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Proceeding of NIPS

  18. Hindawi M, Allab K, Benabdeslem K (2011) Constraint selection based semi-supervised feature selection. In: Proceedings of international conference on data mining, pp 1080–1085

  19. Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32(5):656–665

    Article  Google Scholar 

  20. Kohonen T (2001) Self organizing map. Springer, Berlin

    Book  MATH  Google Scholar 

  21. Lee J, Kim DW (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett 34(3):349–357

    Article  Google Scholar 

  22. Lee J, Kim DW (2015) Memetic feature selection algorithm for multi-label classification. Inf Sci 293:80–96

    Article  Google Scholar 

  23. Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Zhang HJ (2007) Correlative multi-label video annotation. In: Proceedings of the 15th international conference on multimedia, ACM, pp 17–26

  24. Qian B, Davidson I (2010) Semi-supervised dimension reduction for multi-label classification. In: Proceedings of AAAI

  25. Read J (2008) A pruned problem transformation method for multi-label classification. In: Proceedings of New Zealand computer science research student conference, pp 143–150

  26. Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359

    Article  MathSciNet  Google Scholar 

  27. Read J, Bifet A, Holmes G, Pfahringer B (2012) Scalable and efficient multi-label classification for evolving data streams. Mach Learn 88(1–2):243–272

    Article  MathSciNet  Google Scholar 

  28. Salton G (1991) Developments in automatic text retrieval. Science 253(5023):974–980

    Article  MathSciNet  Google Scholar 

  29. Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168

    Article  MATH  Google Scholar 

  30. Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. ISMIR 8:325–330

    Google Scholar 

  31. Tsoumakas G, Vlahavas I (2007) Random k-labelsets: an ensemble method for multi-label classification. In: Machine learnin, ECML. Springer, Berlin, pp 406–417

  32. Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. IJDWM 3(3):1–13

    Google Scholar 

  33. Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multi- label classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 workshop on mining multidimensional data (MMD’08), Antwerp, Belgium

  34. Tsoumakas G, Katakis I, Vlahavas I (2009) Mining multi-label data. In: Data mining and knowledge discovery handbook. Springer, US, pp 667–685

  35. Xu J (2013) Fast multi-label core vector machine. Pattern Recognit 46(3):885–898

    Article  MATH  Google Scholar 

  36. Zhang D, Chen S, Zhou Z (2008) Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recognit 41(5):1440–1451

    Article  MATH  Google Scholar 

  37. Zhang D, Zhou Z, Chen, S (2007) Semi-supervised dimensionality reduction. In: Proceedings of SIAM international conference on data mining

  38. Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048

    Article  MATH  Google Scholar 

  39. Zhang Y, Zhou ZH (2010) Multilabel dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data (TKDD) 4(3):14

    Google Scholar 

  40. Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: Proceedings of SDM, SIAM

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work was done in LIRIS, Lab. CNRS 5205 in Lyon1 University. The work was supported by an Algerian Research Scholarship (PNE: Programme National Exceptionnel).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khalid Benabdeslem.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alalga, A., Benabdeslem, K. & Taleb, N. Soft-constrained Laplacian score for semi-supervised multi-label feature selection. Knowl Inf Syst 47, 75–98 (2016). https://doi.org/10.1007/s10115-015-0841-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0841-8

Keywords

Navigation