Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification

Bai, Shengxing; Lin, Yaojin; Lv, Yan; Chen, Jinkun; Wang, Chenxi

doi:10.1007/s10489-020-01863-5

Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification

Published: 30 September 2020

Volume 51, pages 1602–1615, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shengxing Bai^1,2,
Yaojin Lin^1,2,
Yan Lv^1,2,
Jinkun Chen³ &
…
Chenxi Wang¹

560 Accesses
19 Citations
Explore all metrics

Abstract

In recent years, many online streaming feature selection approaches focus on flat data, which means that all data are taken as a whole. However, in the era of big data, not only the feature space of data has unknown and evolutionary characteristics, but also the label space of data exists hierarchical structure. To address this problem, an online streaming feature selection framework for large-scale hierarchical classification task is proposed. The framework consists of three parts: (1) a new hierarchical data-oriented kernelized fuzzy rough model with sibling strategy is constructed, (2) the online important feature is selected based on feature correlation analysis, and (3) the online redundant feature is deleted based on feature redundancy. Finally, an empirical study using several hierarchical classification data sets manifests that the proposed method outperforms other state-of-the-art online streaming feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Dipti Theng & Kishor K. Bhoyar

A review of unsupervised feature selection methods

Article 29 January 2019

Saúl Solorio-Fernández, J. Ariel Carrasco-Ochoa & José Fco. Martínez-Trinidad

A comprehensive survey on feature selection in the various fields of machine learning

Article 23 July 2021

Pradip Dhal & Chandrashekhar Azad

References

Abualigah L, Hanandeh E (2015) Applying genetic algorithms to information retrieval using vector space model. International Journal of Computer Science, Engineering and Applications 5:19–28
Google Scholar
Abualigah L, Khader A (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73:4773–4795
Google Scholar
Abualigah L, Khader A, Hanandeh E (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Computational Sci 25:456–466
Google Scholar
Abualigah L, Khader A, Hanandeh E, Gandomi A (2017) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435
Google Scholar
Abualigah L, Khader A, Hanandeh E (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48: 4047–4071
Google Scholar
Abualigah L, Khader A, Hanandeh E (2018) A combination of objective functions and hybrid krill herd algorithm for text document clustering analysis. Eng Appl Artif Intel 73:111–125
Google Scholar
Abualigah L (2019) Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering. Studies in Computational Intelligence
Aho A, Hopcroft J, Ullman J (1976) On finding lowest common ancestors in trees. SIAM J Comput 5:115–132
MathSciNet MATH Google Scholar
Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Sherlock G (2000) Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet 25:25–29
Google Scholar
Blake C, Merz C (2000) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Cai L, Hofmann T (2007) Exploiting known taxonomies in learning overlapping concepts. International Joint Conference on Artificial Intelligence, Hyderabad, pp 714–719
Ceci M, Malerba D (2007) Classifying web documents in a hierarchy of categories: a comprehensive study. Intell Info Sys 28:37–38
Google Scholar
Dekel O, Keshet J, Singer Y (2004) Large margin hierarchical classification. International Conference on Machine Learning, Alberta, pp 1–8
Eskandari S, Javidi M (2016) Online streaming feature selection using rough sets. Int J Approx Reason 69:35–57
MathSciNet MATH Google Scholar
Deng J, Dong W, Socher R, Li L, Li K, Fei L (2009) ImageNet: A large-scale hierarchical image database. Computer Vision and Pattern Recognition, Florida, 248–255
Ding C, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17:349–358
Google Scholar
Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. Int J Gen Syst 17:191–209
MATH Google Scholar
Dunn O (1961) Multiple comparisons among means. J Am Stat Assoc 56:52–64
MathSciNet MATH Google Scholar
Eisner R, Poulin B, Szafron D, Lu P, Greiner R (2005) Improving protein function prediction using the hierarchical structure of the gene ontology. Computational Intelligence in Bioinformatics and Computational Biology, La Jolla, pp 1–10
Everingham M, Van G, Williams C, Win J, Zisserman A (2010) The Pascal Visual Object Classes (VOC) challenge. Int J Comput Vis 88:303–338
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11: 86–92
MathSciNet MATH Google Scholar
Freeman C, Kulic D, Basir O (2011) Joint feature selection and hierarchical classifier design. Systems, Man and Cybernetics, Arizona, 1728–1734
Genton M (2002) Classes of kernels for machine learning: a statistics perspective. J Mach Learn Res 2:299–312
MathSciNet MATH Google Scholar
Gopal S, Yang Y (2015) Hierarchical bayesian inference and recursive regularization for large-scale classification. ACM Transactions on Knowledge Discovery From Data 9:18–29
Google Scholar
Hu Q, Yu D, Xie Z (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27:414–423
Google Scholar
Hu Q, Xie Z, Yu D (2007) Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recogn 40:3509–3521
MATH Google Scholar
Hu Q, Yu D, Pedrycz W, Chen D (2011) Kernelized fuzzy rough sets and their applications. IEEE Trans Knowl Data Eng 23:1649–1667
Google Scholar
Hu X, Zhou P, Li P, Wang J, Wu X (2018) A survey on online feature selection with streaming features. Frontiers of Computer Science in China 12:479–493
Google Scholar
Javidi M, Eskandari S (2016) Streamwise feature selection: a rough set method. Int J Mach Learning Cybern 9:667– 676
Google Scholar
Jensen R, Shen Q (2009) New approaches to fuzzy-rough feature selection. IEEE Trans Fuzzy Syst 17:824–838
Google Scholar
Kosmopoulos A, Partalas I, Gaussier E, Paliouras G, Androutsopoulos I (2015) Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Min Knowl Disc 29:820–865
MathSciNet MATH Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases 1:1124–1232
Google Scholar
Lampert C, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. Computer Vision and Pattern Recognition, Florida, 951–958
Li Y, Wu S, Lin Y, Liu J (2017) Different classes’ ratio fuzzy rough set based robust feature selection. Knowl Based Sys 120:74–86
Google Scholar
Lin Y, Hu Q, Liu J, Li J, Wu X (2017) Streaming feature selection for multilabel learning based on fuzzy mutual information. IEEE Trans Fuzzy Syst 25:1491–1507
Google Scholar
Liu J, Lin Y, Li Y, Weng W, Wu S (2018) Online multi-label streaming feature selection based on neighborhood rough set. Pattern Recogn 84:273–287
Google Scholar
Mi J, Zhang W (2004) An axiomatic characterization of a fuzzy generalization of rough sets. Inform Sci 160:235–249
MathSciNet MATH Google Scholar
Moser B (2006) On representing and generating kernels by fuzzy equivalence relations. J Mach Learn Res 7:2603–2620
MathSciNet MATH Google Scholar
Nouranivatani N, Lopezsastre R, Williams S (2015) Structured output prediction with hierarchical loss functions for seafloor imagery taxonomic categorization. Iberian Conference on Pattern Recognition and Image Analysis, Santiago de Compostela, 173–183
Rahmaninia M, Moradi P (2017) OSFSMI: Online stream feature selection method based on mutual information. Appl Soft Comput 68:733–746
Google Scholar
Silla C, Freitas A (2011) A survey of hierarchical classification across different application domains. Data Mining Knowledge Discovery 22:31–72
MathSciNet MATH Google Scholar
Song J, Zhang P, Qin S, Gong J (2015) A method of the feature selection in hierarchical text classification based on the category discrimination and position information. IEEE Trans Eng Manag 53:555–569
Google Scholar
Struyf J, Deroski S, Blockeel H, Clare A (2005) Hierarchical multi-classification with predictive clustering trees in functional genomics. Portuguese Conference on Artificial Intelligence, Covilha, 272–283
Sun A, Lim E (2001) Hierarchical text classification and evaluation. International Conference on Data Mining, California, 521–528
Wang C, Shao M, He Q, Qian Y, Qi Y (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl Based Sys 111:173–179
Google Scholar
Wang C, Lin Y, Liu J (2019) Feature selection for multi-label learning with missing labels. Appl Intell 49:3027–3042
Google Scholar
Wei L, Liao M, Gao X, Zou Q (2015) An improved protein structural classes prediction method by incorporating both sequence and structure information. IEEE Trans Nanobioscience 14:339–349
Google Scholar
Wu X, Yu K, Ding W, Wang H, Zhu X (2013) Online feature selection with streaming features. IEEE Trans Pattern Anal Mach Intell 35:1178–1192
Google Scholar
Yu K, Wu X, Ding W, Pei J (2016) Scalable and accurate online feature selection for big data. ACM Transactions on Knowledge Discovery From Data 11:16–37
Google Scholar
Zhang J, Li C, Lin Y, Shao Y, Li S (2017) Computational drug repositioning using collaborative filtering via multi-source fusion. Expert Systems With Applications 84:281–289
Google Scholar
Zhao H, Zhu P, Wang P, Hu Q (2017) Hierarchical feature selection with recursive regularization. International Joint Conference on Artificial Intelligence, Melbourne, 3483–3489
Zhao H, Wang P, Hu Q, Zhu P (2019) Fuzzy rough set based feature selection for large-scale hierarchical classification. IEEE Trans Fuzzy Syst 27:1891–1903
Google Scholar
Zhao H, Hu Q, Zhu P, Wang Y, Wang P (2019) A recursive regularization based feature selection framework for hierarchical classification. IEEE Trans Knowl Data Eng 27:1–13
Google Scholar
Zhou P, Hu X, Li P (2017) A New online feature selection method using neighborhood rough set. IEEE International Conference on Big Knowledge. Hefei, 135–142
Zhou P, Hu X, Li P, Wu X (2017) Online feature selection for high-dimensional class-imbalanced data. Knowl Based Sys 136:187–199
Google Scholar
Zhou P, Hu X, Li P, Wu X (2019) Online streaming feature selection using adapted Neighborhood Rough Set. Inform Sci 481:258–279
Google Scholar
Zhou P, Hu X, Li P, Wu X (2019) OFS-Density: A novel online streaming feature selection method. Pattern Recogn 86:48–61
Google Scholar

Download references

Acknowledgments

We are very grateful to the anonymous reviewers for their valuable comments and suggestions. This work is supported by Grants from the National Natural Science Foundation of China (No. 61672272),the Natural Science Foundation of Fujian Province (Nos. 2018J01547 and 2018J01548) and the Department of Education of Fujian Province (No. JAT180318).

Author information

Authors and Affiliations

School of Computer Science, Minnan Normal University, Zhangzhou, 363000, People’s Republic of China
Shengxing Bai, Yaojin Lin, Yan Lv & Chenxi Wang
Laboratory of Data Science, Intelligence Application, Minnan Normal University, Zhangzhou, 363000, People’s Republic of China
Shengxing Bai, Yaojin Lin & Yan Lv
School of Mathematics and Statistics, Minnan Normal University, Zhangzhou, 363000, People’s Republic of China
Jinkun Chen

Authors

Shengxing Bai
View author publications
You can also search for this author in PubMed Google Scholar
Yaojin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yan Lv
View author publications
You can also search for this author in PubMed Google Scholar
Jinkun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chenxi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaojin Lin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bai, S., Lin, Y., Lv, Y. et al. Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification. Appl Intell 51, 1602–1615 (2021). https://doi.org/10.1007/s10489-020-01863-5

Download citation

Published: 30 September 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10489-020-01863-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

A comprehensive survey on feature selection in the various fields of machine learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kernelized fuzzy rough sets based online streaming feature selection for large-scale hierarchical classification

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

A comprehensive survey on feature selection in the various fields of machine learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation