Skip to main content

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

  • Conference paper
  • First Online:
14th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2019) (SOCO 2019)

Abstract

This paper tackles problems where attribute selection is not only able to choose a few features but also to achieve a low performance classification in terms of accuracy compared to the full attribute set. Correlation-based feature selection (CFS) has been set as the baseline attribute subset selector due to its popularity and high performance. Around hundred data sets have been collected and submitted to CFS; then the problems fulfilling simultaneously the conditions: (a) a number of selected attributes lower than six and (b) a percentage of selected attributes lower than a forty per cent, have been tested onto two directions. Firstly, in the scope of data selection at the feature level, an advanced contemporary approach have been conducted as well as some options proposed in a prior work. Secondly, the pre-processed and initial problems have been tested with some sturdy classifiers. Moreover, this work introduces a new taxonomy of feature selection according to the solution type and the followed way to compute it. The test bed comprises seven problems featured by a low dimensionality after the CFS application, three out of them report a single selected attribute, another one with two extracted features and the three remaining data sets with four or five retained attributes; additionally, the initial feature set is between six and twenty-nine and the complexity of the problems, in terms of classes, fluctuates between two and twenty-one, throwing averages of sixteen and around five for both aforementioned properties. The contribution concluded that the advanced procedure (extended CFS) is suitable for problems where only one or two attributes are selected by CFS; for data sets with more than two selected features the baseline method is preferable to the advanced one, although the considered feature ranking method achieved intermediate results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  2. Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Intelligent assistance for data pre-processing. Comput. Stand. Interfaces 57, 101–109 (2018)

    Article  Google Scholar 

  3. Chen, Y., Li, Y., Cheng, X.-Q., Guo, L.: Survey and taxonomy of feature selection algorithms in intrusion detection system. In: International Conference on Information Security and Cryptology, pp. 153–167. Springer, Heidelberg (2006)

    Google Scholar 

  4. Cho, S.-B., Tallón-Ballesteros, A.J.: Visual tools to lecture data analytics and engineering. In: International Work-Conference on the Interplay Between Natural and Artificial Computation, pp. 551–558. Springer, Heidelberg (2017)

    Chapter  Google Scholar 

  5. Cover, T., Hart, P.: Nearest neighbor pattern classification. Inf. Theor. IEEE Transact. 13(1), 21–27 (1967)

    Article  Google Scholar 

  6. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (eds.) Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann (1998)

    Google Scholar 

  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  8. Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D thesis, University of Waikato, Hamilton, New Zealand (1999)

    Google Scholar 

  9. Miao, J., Niu, L.: A survey on feature selection. Proc. Comput. Sci. 91, 919–926 (2016)

    Article  Google Scholar 

  10. Quinlan, J.R.: C4.5: Programs for Machine Learning, vol. 1. Morgan kaufmann, Burlington (1993)

    Google Scholar 

  11. Shapiro, A., Niblett, T.: Automatic induction of classification rules for a chess endgame. In: Advances in Computer Chess, pp. 73–92. Elsevier (1982)

    Google Scholar 

  12. Tallón-Ballesteros, A.J., Correia, L., Xue, B.: Featuring the attributes in supervised machine learning. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 350–362. Springer (2018)

    Google Scholar 

  13. Tallón-Ballesteros, A.J., Riquelme, J.C.: Low dimensionality or same subsets as a result of feature selection: an in-depth roadmap. In: International Work-Conference on the Interplay Between Natural and Artificial Computation, pp. 531–539. Springer (2017)

    Google Scholar 

  14. Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Merging subsets of attributes to improve a hybrid consistency-based filter: a case of study in product unit neural networks. Connect. Sci. 28(3), 242–257 (2016)

    Article  Google Scholar 

  15. Tan, P.-N.: Introduction to Data Mining. Pearson Education, India (2018)

    Google Scholar 

  16. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    Book  Google Scholar 

  17. Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39. Citeseer (2000)

    Google Scholar 

Download references

Acknowledgments

This work has been partially subsidised by TIN2014-55894-C2-R and TIN2017-88209-C2-2-R projects of the Spanish Inter-Ministerial Commission of Science and Technology (MICYT), FEDER funds and the P11-TIC-7528 project of the “Junta de Andalucía” (Spain).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio J. Tallón-Ballesteros .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tallón-Ballesteros, A.J., Cavique, L., Fong, S. (2020). Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?. In: Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J., Quintián, H., Corchado, E. (eds) 14th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2019). SOCO 2019. Advances in Intelligent Systems and Computing, vol 950. Springer, Cham. https://doi.org/10.1007/978-3-030-20055-8_24

Download citation

Publish with us

Policies and ethics