Abstract
Feature Selection (FS) is the core part of data processing pipeline. Use of ensemble in FS is a relatively new approach aiming at producing more diversity in feature dataset, which provides better performance as well as more robust and accurate result. An aggregation step combined the output of each FS method and generate the Single feature Subset. In this paper, a novel ensemble method for FS “EFSCAT” is proposed which rank all the features and then cluster the most related features. To reduce the size of ranking an automatic threshold in every ranker is being introduced. This added thresholding step will improve the computational efficiency because it cutoff low-ranking features which were initially ranked by Ranker. Mean-shift clustering is then use to combined the results of each ranker. The process of aggregation will become very time efficient. “EFSCAT” will make the classification more robust and stable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kozodoi, N., et al.: A multi-objective approach for profit-driven feature selection in credit scoring. Decis. Support Syst. 120, 106–117 (2019)
Sayed, G.I., Hassanien, A.E., Azar, A.T.: Feature selection via a novel chaotic crow search algorithm. Neural Comput. Appl. 31(1), 171–188 (2017). https://doi.org/10.1007/s00521-017-2988-6
Zhou, P., et al.: Online streaming feature selection using adapted neighborhood rough set. Inf. Sci. 481, 258–279 (2019)
Hussain, A., Cambria, E.: Semi-supervised learning for big social data analysis. Neurocomputing 275, 1662–1673 (2018)
Fierrez, J., et al.: Multiple classifiers in biometrics. part 2: trends and challenges. Inf. Fusion 44, 103–112 (2018)
Drotár, P., Gazda, M., Vokorokos, L.: Ensemble feature selection using election methods and ranker clustering. Inf. Sci. 480, 365–380 (2019)
Oskouei, M.D., Razavi, S.N.: An ensemble feature selection method to detect web spam. Asia-Pac. J. Inf. Technol. Multi. 7(2), 99–133 (2018)
Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft. Comput. 22(3), 811–822 (2016). https://doi.org/10.1007/s00500-016-2385-6
Chormunge, S., Jena, S.: Correlation based feature selection with clustering for high dimensional data. J. Electr. Syst. Inf. Technol. 5(3), 542–549 (2018)
Gao, W., et al.: Feature selection considering the composition of feature relevancy. Pattern Recogn. Lett. 112, 70–74 (2018)
Cilia, N.D., et al.: A ranking-based feature selection approach for handwritten character recognition. Pattern Recogn. Lett. 121, 77–86 (2019)
Bermejo, S.: Ensembles of wrappers for automated feature selection in fish age classification. Comput. Electron. Agric. 134, 27–32 (2017)
Panday, D., de Amorim, R.C., Lane, P.: Feature weighting as a tool for unsupervised feature selection. Inf. Process. Lett. 129, 44–52 (2018)
Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)
Tüysüzoğlu, G., Yaslan, Y.: Sparse coding based classifier ensembles in supervised and active learning scenarios for data classification. Expert Syst. Appl. 91, 364–373 (2018)
Pes, B., Dessì, N., Angioni, M.: Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data. Inf. Fusion 35, 132–147 (2017)
Seijo-Pardo, B., et al.: Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl.-Based Syst. 118, 124–139 (2017)
Seijo-Pardo, B., Bolón-Canedo, V., Alonso-Betanzos, A.: On developing an automatic threshold applied to feature selection ensembles. Inf. Fusion 45, 227–245 (2019)
Liu, K., et al.: Rough set based semi-supervised feature selection via ensemble selector. Knowl.-Based Syst. 165, 282–296 (2019)
Chiew, K.L., et al.: A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166 (2019)
Manbari, Z., AkhlaghianTab, F., Salavati, C.: Hybrid fast unsupervised feature selection for high-dimensional data. Expert Syst. Appl. 124, 97–118 (2019)
Saha, A., Das, S.: Clustering of fuzzy data and simultaneous feature selection: a model selection approach. Fuzzy Sets Syst. 340, 1–37 (2018)
Wang, Y., Feng, L.: Hybrid feature selection using component co-occurrence based feature relevance measurement. Expert Syst. Appl. 102, 83–99 (2018)
Sahu, B., Dehuri, S., Jagadev, A.K.: Feature selection model based on clustering and ranking in pipeline for microarray data. Inform. Med. Unlocked 9, 107–122 (2017)
Das, A.K., Das, S., Ghosh, A.: Ensemble feature selection using bi-objective genetic algorithm. Knowl.-Based Syst. 123, 116–127 (2017)
Yan, Y., et al.: LSTM $^{} $: multi-label ranking for document classification. Neural Process. Lett. 47(1), 117–138 (2018)
Myhre, J.N., et al.: Robust clustering using a kNN mode seeking ensemble. Pattern Recogn. 76, 491–505 (2018)
Xia, X., Lin, T., Chen, Z.: Maximum relevancy maximum complementary based ordered aggregation for ensemble pruning. Appl. Intell. 48(9), 2568–2579 (2017). https://doi.org/10.1007/s10489-017-1106-x
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Jahan, M.S., Amjad, A., Qamar, U., Riaz, M.T., Ayub, K. (2020). A Novel Approach for Ensemble Feature Selection Using Clustering with Automatic Threshold. In: Mata-Rivera, M.F., Zagal-Flores, R., Barria-Huidobro, C. (eds) Telematics and Computing. WITCOM 2020. Communications in Computer and Information Science, vol 1280. Springer, Cham. https://doi.org/10.1007/978-3-030-62554-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-62554-2_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62553-5
Online ISBN: 978-3-030-62554-2
eBook Packages: Computer ScienceComputer Science (R0)