3-3FS: ensemble method for semi-supervised multi-label feature selection

Alalga, Abdelouahid; Benabdeslem, Khalid; Mansouri, Dou El Kefel

doi:10.1007/s10115-021-01616-x

3-3FS: ensemble method for semi-supervised multi-label feature selection

Regular Paper
Published: 28 October 2021

Volume 63, pages 2969–2999, (2021)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Abdelouahid Alalga¹,
Khalid Benabdeslem² &
Dou El Kefel Mansouri ORCID: orcid.org/0000-0001-7365-4804³

347 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Feature selection has received considerable attention over the past decade. However, it is continuously challenged by new emerging issues. Semi-supervised multi-label learning is one of these promising novel approaches. In this work, we refer to it as an approach that combines data consisting of a huge amount of unlabeled instances with a small number of multi-labeled instances. Semi-supervised multi-label feature selection, like conventional feature selection algorithms, has a rather poor record as regards stability (i.e. robustness with respect to changes in data). To address this weakness and improve the robustness of the feature selection process in high-dimensional data, this document develops an ensemble methodology based on a 3-way resampling of data: (1) Bagging, (2) a random subspace method (RSM) and (3) an additional random sub-labeling strategy (RSL). The proposed framework contributes to enhancing the stability of feature selection algorithms and to improving their performance. Our research findings illustrate that bagging and RSM help improve the stability of the feature selection process and increase learning accuracy, while RSL addresses label correlation, which is a major concern with multi-label data. The paper presents the key findings of a series of experiments, which we conducted on selected benchmark data sets in the classification task. Results are promising, highlighting that the proposed method either outperforms state-of-the-art algorithms or produces at least comparable results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-label Classification Using Random Label Subset Selections

Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection

Article 23 November 2015

Khalid Benabdeslem, Haytham Elghazel & Mohammed Hindawi

A Survey on Ensemble Multi-label Classifiers

Notes

https://bailando.berkeley.edu/enron_email.html.

References

Alalga A, Benabdeslem K, Taleb N (2016) Soft-constrained Laplacian score for semi-supervised multi-label feature selection. Knowl Inf Syst 47(1):75–98
Google Scholar
Amezcua J, Melin P (2019) A new fuzzy learning vector quantization method for classification problems based on a granular approach. Granul Comput 4(2):197–209
Google Scholar
Aydav PSS, Minz S (2020) Granulation-based self-training for the semi-supervised classification of remote-sensing images. Granul Comput 5(3):309–327
Google Scholar
Barnard K, Duygulu P, Forsyth D, Freitas ND, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3(Feb):1107–1135
MATH Google Scholar
Barutcuoglu Z, Schapire RE, Troyanskaya OG (2006) Hierarchical multi-label prediction of gene function. Bioinformatics 22(7):830–836
Google Scholar
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1–2):105–139
Google Scholar
Benabdeslem K, Elghazel H, Hindawi M (2016) Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection. Knowl Inf Syst 49(3):1161–1185
Google Scholar
Benabdeslem K, Hindawi M (2011) Constrained Laplacian score for semi-supervised feature selection. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 204–218
Benabdeslem K, Hindawi M (2014) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143
Google Scholar
Benouini R, Batioua I, Ezghari S, Zenkouar K, Zahi A (2020) Fast feature selection algorithm for neighborhood rough set model based on bucket and trie structures. Granul Comput 5(3):329–347
Google Scholar
Bolón-Canedo V, Alonso-Betanzos A (2019) Ensembles for feature selection: A review and future trends. Inf Fusion 52:1–12
Google Scholar
Borchani H, Varando G, Bielza C, Larrañaga P (2015) A survey on multi-output regression. Wiley Interdiscip Rev Data Min Knowl Discov 5(5):216–233
Google Scholar
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MathSciNet MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
MATH Google Scholar
Carmona-Cejudo JM, Baena-García M, del Campo-Avila J, Morales-Bueno R (2011) Feature extraction for multi-label learning in the domain of email classification, In: 2011 IEEE symposium on computational intelligence and data mining (CIDM). IEEE, pp 30–36
Chung FR, Graham FC (1997) Spectral graph theory, number 92. American Mathematical Society, Providence
Google Scholar
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156
Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
MathSciNet MATH Google Scholar
Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. arXiv preprint arXiv:cs/9501101 [cs]
Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5(Aug):845–889
MathSciNet MATH Google Scholar
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14:681–687
Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
MathSciNet MATH Google Scholar
Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156
Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 186:189
Google Scholar
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Proceeding of NIPS. vol 186, p 189
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Google Scholar
Huang Y, Jin W, Yu Z, Li B (2020) Supervised feature selection through deep neural networks with pairwise connected structure. Knowl-Based Syst 204:106202
Google Scholar
Kalakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recognit Lett 32(5):656–665
Google Scholar
Kocev D, Džeroski S, White MD, Newell GR, Griffioen P (2009) Using single-and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol Model 220(8):1159–1168
Google Scholar
Kuznar D, Mozina M, Bratko I (2009) Curve prediction with kernel regression. In: Proceedings of the 1st workshop on learning from multi-label data, pp 61–68
Lee J, Kim D-W (2015) Memetic feature selection algorithm for multi-label classification. Inf Sci 293:80–96
Google Scholar
Li X, Zhang H, Zhang R, Nie F (2019) Discriminative and uncorrelated feature selection with constrained spectral analysis in unsupervised learning. IEEE Trans Image Process 29(1):2139–2149
MathSciNet Google Scholar
Liu H, Cocea M (2019) Granular computing-based approach of rule learning for binary classification. Granul Comput 4(2):275–283
Google Scholar
Liu H, Cocea M (2019) Nature-inspired framework of ensemble learning for collaborative classification in granular computing context. Granul Comput 4(4):715–724
Google Scholar
Liu M, Zhang D (2015) Pairwise constraint-guided sparse learning for feature selection. IEEE Trans Cybern 46(1):298–310
Google Scholar
Mirzaei A, Pourahmadi V, Soltani M, Sheikhzadeh H (2020) Deep feature selection using a teacher–student network. Neurocomputing 383:396–408
Google Scholar
Nasierding G, Kouzani AZ, Tsoumakas G (2010) A triple-random ensemble classification method for mining multi-label data. In: 2010 IEEE international conference on data mining workshops. IEEE, pp 49–56
Nogueira S, Sechidis K, Brown G (2017) On the stability of feature selection algorithms. J Mach Learn Res 18:174:1-174:54
MathSciNet MATH Google Scholar
Pes B (2020) Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. Neural Comput Appl 32(10):5951–5973
Google Scholar
Qi G-J, Hua X-S, Rui Y, Tang J, Mei T, Zhang H-J (2007) Correlative multi-label video annotation. In: Proceedings of the 15th international conference on multimedia, ACM, pp 17–26
Read J, Bifet A, Holmes G, Pfahringer B (2012) Scalable and efficient multi-label classification for evolving data streams. Mach Learn 88(1–2):243–272
MathSciNet Google Scholar
Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: 2008 eighth IEEE international conference on data mining. IEEE, pp 995–1000
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
MathSciNet Google Scholar
Saeys Y, Abeel T, Van de Peer Y (2008) Robust feature selection using ensemble feature selection techniques. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 313–325
Salmi A, Hammouche K, Macaire L (2020) Similarity-based constraint score for feature selection. Knowl-Based Syst 209:106429
Google Scholar
Salton G (1991) Developments in automatic text retrieval. Science 253(5023):974–980
MathSciNet Google Scholar
Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168
MATH Google Scholar
Spyromitros-Xioufis E, Tsoumakas G, Groves W, Vlahavas I (2012) Multi-label classification methods for multi-target regression. arXiv preprint arXiv:1211.6581, pp 1159–1168
Sun D, Zhang D (2010) Bagging constraint score for feature selection with pairwise constraints. Pattern Recognit 43(6):2106–2118
MATH Google Scholar
Sun L, Feng S, Wang T, Lang C, Jin Y (2019) Partial multi-label learning by low-rank and sparse decomposition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 5016–5023
Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. ISMIR 8:325–330
Google Scholar
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13
Google Scholar
Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 workshop on mining multidimensional data (MMD08), pp 30–44
Tsoumakas G, Katakis I, Vlahavas I (2010) Mining multi-label data. In: Data mining and knowledge discovery handbook. Springer, pp 667–685
Tsoumakas G, Vlahavas I (2007) Random k-labelsets: an ensemble method for multilabel classification, In: Machine learning: ECML 2007. Springer, pp 406–417
Wang X, Ding W, Liu H, Huang X (2020) Shape recognition through multi-level fusion of features and classifiers. Granul Comput 5:437–448
Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Google Scholar
Xu J (2013) Fast multi-label core vector machine. Pattern Recognit 46(3):885–898
MATH Google Scholar
Zhang D, Chen S, Zhou Z (2008) Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recognit 41(5):1440–1451
MATH Google Scholar
Zhang D, Zhou Z, Chen S (2007) Semi-supervised dimensionality reduction. In: Proceedings of SIAM international conference on data mining
Zhang M-L, Fang J-P (2020) Partial multi-label learning via credible label elicitation. IEEE Trans Pattern Anal Mach Intell 9:99. https://doi.org/10.1109/TPAMI.2020.2985210
Article Google Scholar
Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
MATH Google Scholar
Zhang R, Li X (2020) Unsupervised feature selection via data reconstruction and side information. IEEE Trans Image Process 29:8097–8106
MathSciNet Google Scholar
Zhao H-H, Liu H (2020) Multiple classifiers fusion and CNN feature extraction for handwritten digits recognition. Granul Comput 5(3):411–418
MathSciNet Google Scholar
Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 641–646

Download references

Acknowledgements

We thank anonymous reviewers for their very useful comments and suggestions. The authors would also like to thank the DGRSDT (General Directorate of Scientific Research and Technological Development) - MESRS (Ministry of Higher Education and Scientific Research), ALGERIA, for the support of the LISCO Laboratory.

Author information

Authors and Affiliations

LISCO Laboratory, Badji Mokhtar-Annaba University, Annaba, Algeria
Abdelouahid Alalga
University of Lyon1-LIRIS, 43 Bd du 11 Novembre 1918, 69622, Villeurbanne, France
Khalid Benabdeslem
University Ibn-Khaldoun, BP P 78 zaâroura, 14000, Tiaret, Algeria
Dou El Kefel Mansouri

Authors

Abdelouahid Alalga
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Benabdeslem
View author publications
You can also search for this author in PubMed Google Scholar
Dou El Kefel Mansouri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dou El Kefel Mansouri.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alalga, A., Benabdeslem, K. & Mansouri, D.E.K. 3-3FS: ensemble method for semi-supervised multi-label feature selection. Knowl Inf Syst 63, 2969–2999 (2021). https://doi.org/10.1007/s10115-021-01616-x

Download citation

Received: 06 August 2020
Revised: 30 September 2021
Accepted: 02 October 2021
Published: 28 October 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s10115-021-01616-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3-3FS: ensemble method for semi-supervised multi-label feature selection

Abstract

Access this article

Similar content being viewed by others

Multi-label Classification Using Random Label Subset Selections

Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection

A Survey on Ensemble Multi-label Classifiers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

3-3FS: ensemble method for semi-supervised multi-label feature selection

Abstract

Access this article

Similar content being viewed by others

Multi-label Classification Using Random Label Subset Selections

Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection

A Survey on Ensemble Multi-label Classifiers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation