Skip to main content

A Novel Approach for Ensemble Feature Selection Using Clustering with Automatic Threshold

  • Conference paper
  • First Online:
Telematics and Computing (WITCOM 2020)

Abstract

Feature Selection (FS) is the core part of data processing pipeline. Use of ensemble in FS is a relatively new approach aiming at producing more diversity in feature dataset, which provides better performance as well as more robust and accurate result. An aggregation step combined the output of each FS method and generate the Single feature Subset. In this paper, a novel ensemble method for FS “EFSCAT” is proposed which rank all the features and then cluster the most related features. To reduce the size of ranking an automatic threshold in every ranker is being introduced. This added thresholding step will improve the computational efficiency because it cutoff low-ranking features which were initially ranked by Ranker. Mean-shift clustering is then use to combined the results of each ranker. The process of aggregation will become very time efficient. “EFSCAT” will make the classification more robust and stable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kozodoi, N., et al.: A multi-objective approach for profit-driven feature selection in credit scoring. Decis. Support Syst. 120, 106–117 (2019)

    Article  Google Scholar 

  2. Sayed, G.I., Hassanien, A.E., Azar, A.T.: Feature selection via a novel chaotic crow search algorithm. Neural Comput. Appl. 31(1), 171–188 (2017). https://doi.org/10.1007/s00521-017-2988-6

    Article  Google Scholar 

  3. Zhou, P., et al.: Online streaming feature selection using adapted neighborhood rough set. Inf. Sci. 481, 258–279 (2019)

    Article  Google Scholar 

  4. Hussain, A., Cambria, E.: Semi-supervised learning for big social data analysis. Neurocomputing 275, 1662–1673 (2018)

    Article  Google Scholar 

  5. Fierrez, J., et al.: Multiple classifiers in biometrics. part 2: trends and challenges. Inf. Fusion 44, 103–112 (2018)

    Article  Google Scholar 

  6. Drotár, P., Gazda, M., Vokorokos, L.: Ensemble feature selection using election methods and ranker clustering. Inf. Sci. 480, 365–380 (2019)

    Article  MathSciNet  Google Scholar 

  7. Oskouei, M.D., Razavi, S.N.: An ensemble feature selection method to detect web spam. Asia-Pac. J. Inf. Technol. Multi. 7(2), 99–133 (2018)

    Google Scholar 

  8. Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft. Comput. 22(3), 811–822 (2016). https://doi.org/10.1007/s00500-016-2385-6

    Article  Google Scholar 

  9. Chormunge, S., Jena, S.: Correlation based feature selection with clustering for high dimensional data. J. Electr. Syst. Inf. Technol. 5(3), 542–549 (2018)

    Article  Google Scholar 

  10. Gao, W., et al.: Feature selection considering the composition of feature relevancy. Pattern Recogn. Lett. 112, 70–74 (2018)

    Article  Google Scholar 

  11. Cilia, N.D., et al.: A ranking-based feature selection approach for handwritten character recognition. Pattern Recogn. Lett. 121, 77–86 (2019)

    Article  Google Scholar 

  12. Bermejo, S.: Ensembles of wrappers for automated feature selection in fish age classification. Comput. Electron. Agric. 134, 27–32 (2017)

    Article  Google Scholar 

  13. Panday, D., de Amorim, R.C., Lane, P.: Feature weighting as a tool for unsupervised feature selection. Inf. Process. Lett. 129, 44–52 (2018)

    Article  MathSciNet  Google Scholar 

  14. Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)

    Article  Google Scholar 

  15. Tüysüzoğlu, G., Yaslan, Y.: Sparse coding based classifier ensembles in supervised and active learning scenarios for data classification. Expert Syst. Appl. 91, 364–373 (2018)

    Article  Google Scholar 

  16. Pes, B., Dessì, N., Angioni, M.: Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data. Inf. Fusion 35, 132–147 (2017)

    Article  Google Scholar 

  17. Seijo-Pardo, B., et al.: Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl.-Based Syst. 118, 124–139 (2017)

    Article  Google Scholar 

  18. Seijo-Pardo, B., Bolón-Canedo, V., Alonso-Betanzos, A.: On developing an automatic threshold applied to feature selection ensembles. Inf. Fusion 45, 227–245 (2019)

    Article  Google Scholar 

  19. Liu, K., et al.: Rough set based semi-supervised feature selection via ensemble selector. Knowl.-Based Syst. 165, 282–296 (2019)

    Article  Google Scholar 

  20. Chiew, K.L., et al.: A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166 (2019)

    Article  Google Scholar 

  21. Manbari, Z., AkhlaghianTab, F., Salavati, C.: Hybrid fast unsupervised feature selection for high-dimensional data. Expert Syst. Appl. 124, 97–118 (2019)

    Article  Google Scholar 

  22. Saha, A., Das, S.: Clustering of fuzzy data and simultaneous feature selection: a model selection approach. Fuzzy Sets Syst. 340, 1–37 (2018)

    Article  MathSciNet  Google Scholar 

  23. Wang, Y., Feng, L.: Hybrid feature selection using component co-occurrence based feature relevance measurement. Expert Syst. Appl. 102, 83–99 (2018)

    Article  Google Scholar 

  24. Sahu, B., Dehuri, S., Jagadev, A.K.: Feature selection model based on clustering and ranking in pipeline for microarray data. Inform. Med. Unlocked 9, 107–122 (2017)

    Article  Google Scholar 

  25. Das, A.K., Das, S., Ghosh, A.: Ensemble feature selection using bi-objective genetic algorithm. Knowl.-Based Syst. 123, 116–127 (2017)

    Article  Google Scholar 

  26. Yan, Y., et al.: LSTM $^{} $: multi-label ranking for document classification. Neural Process. Lett. 47(1), 117–138 (2018)

    Article  Google Scholar 

  27. Myhre, J.N., et al.: Robust clustering using a kNN mode seeking ensemble. Pattern Recogn. 76, 491–505 (2018)

    Article  Google Scholar 

  28. Xia, X., Lin, T., Chen, Z.: Maximum relevancy maximum complementary based ordered aggregation for ensemble pruning. Appl. Intell. 48(9), 2568–2579 (2017). https://doi.org/10.1007/s10489-017-1106-x

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Shah Jahan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jahan, M.S., Amjad, A., Qamar, U., Riaz, M.T., Ayub, K. (2020). A Novel Approach for Ensemble Feature Selection Using Clustering with Automatic Threshold. In: Mata-Rivera, M.F., Zagal-Flores, R., Barria-Huidobro, C. (eds) Telematics and Computing. WITCOM 2020. Communications in Computer and Information Science, vol 1280. Springer, Cham. https://doi.org/10.1007/978-3-030-62554-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62554-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62553-5

  • Online ISBN: 978-3-030-62554-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics