Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

Tallón-Ballesteros, Antonio J.; Cavique, Luís; Fong, Simon

doi:10.1007/978-3-030-20055-8_24

Antonio J. Tallón-Ballesteros¹⁹,
Luís Cavique²⁰ &
Simon Fong²¹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 950))

Included in the following conference series:

International Workshop on Soft Computing Models in Industrial and Environmental Applications

1371 Accesses
7 Citations

Abstract

This paper tackles problems where attribute selection is not only able to choose a few features but also to achieve a low performance classification in terms of accuracy compared to the full attribute set. Correlation-based feature selection (CFS) has been set as the baseline attribute subset selector due to its popularity and high performance. Around hundred data sets have been collected and submitted to CFS; then the problems fulfilling simultaneously the conditions: (a) a number of selected attributes lower than six and (b) a percentage of selected attributes lower than a forty per cent, have been tested onto two directions. Firstly, in the scope of data selection at the feature level, an advanced contemporary approach have been conducted as well as some options proposed in a prior work. Secondly, the pre-processed and initial problems have been tested with some sturdy classifiers. Moreover, this work introduces a new taxonomy of feature selection according to the solution type and the followed way to compute it. The test bed comprises seven problems featured by a low dimensionality after the CFS application, three out of them report a single selected attribute, another one with two extracted features and the three remaining data sets with four or five retained attributes; additionally, the initial feature set is between six and twenty-nine and the complexity of the problems, in terms of classes, fluctuates between two and twenty-one, throwing averages of sixteen and around five for both aforementioned properties. The contribution concluded that the advanced procedure (extended CFS) is suitable for problems where only one or two attributes are selected by CFS; for data sets with more than two selected features the baseline method is preferable to the advanced one, although the considered feature ranking method achieved intermediate results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Intelligent assistance for data pre-processing. Comput. Stand. Interfaces 57, 101–109 (2018)
Article Google Scholar
Chen, Y., Li, Y., Cheng, X.-Q., Guo, L.: Survey and taxonomy of feature selection algorithms in intrusion detection system. In: International Conference on Information Security and Cryptology, pp. 153–167. Springer, Heidelberg (2006)
Google Scholar
Cho, S.-B., Tallón-Ballesteros, A.J.: Visual tools to lecture data analytics and engineering. In: International Work-Conference on the Interplay Between Natural and Artificial Computation, pp. 551–558. Springer, Heidelberg (2017)
Chapter Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. Inf. Theor. IEEE Transact. 13(1), 21–27 (1967)
Article Google Scholar
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (eds.) Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann (1998)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D thesis, University of Waikato, Hamilton, New Zealand (1999)
Google Scholar
Miao, J., Niu, L.: A survey on feature selection. Proc. Comput. Sci. 91, 919–926 (2016)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning, vol. 1. Morgan kaufmann, Burlington (1993)
Google Scholar
Shapiro, A., Niblett, T.: Automatic induction of classification rules for a chess endgame. In: Advances in Computer Chess, pp. 73–92. Elsevier (1982)
Google Scholar
Tallón-Ballesteros, A.J., Correia, L., Xue, B.: Featuring the attributes in supervised machine learning. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 350–362. Springer (2018)
Google Scholar
Tallón-Ballesteros, A.J., Riquelme, J.C.: Low dimensionality or same subsets as a result of feature selection: an in-depth roadmap. In: International Work-Conference on the Interplay Between Natural and Artificial Computation, pp. 531–539. Springer (2017)
Google Scholar
Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Merging subsets of attributes to improve a hybrid consistency-based filter: a case of study in product unit neural networks. Connect. Sci. 28(3), 242–257 (2016)
Article Google Scholar
Tan, P.-N.: Introduction to Data Mining. Pearson Education, India (2018)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Book Google Scholar
Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39. Citeseer (2000)
Google Scholar

Download references

Acknowledgments

This work has been partially subsidised by TIN2014-55894-C2-R and TIN2017-88209-C2-2-R projects of the Spanish Inter-Ministerial Commission of Science and Technology (MICYT), FEDER funds and the P11-TIC-7528 project of the “Junta de Andalucía” (Spain).

Author information

Authors and Affiliations

Department of Electronic, Computer Systems and Automation Engineering, University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros
Universidade Aberta, Lisbon, Portugal
Luís Cavique
Department of Computer and Information Science, University of Macau, Taipa, Macau SAR, China
Simon Fong

Authors

Antonio J. Tallón-Ballesteros
View author publications
You can also search for this author in PubMed Google Scholar
Luís Cavique
View author publications
You can also search for this author in PubMed Google Scholar
Simon Fong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio J. Tallón-Ballesteros .

Editor information

Editors and Affiliations

Data Science and Big Data Lab, Pablo de Olavide University, Seville, Spain
Francisco Martínez Álvarez
Data Science and Big Data Lab, Pablo de Olavide University, Seville, Spain
Alicia Troncoso Lora
University of Salamanca, Salamanca, Spain
José António Sáez Muñoz
Department of Industrial Engineering, University of A Coruña, A Coruña, Spain
Héctor Quintián
University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tallón-Ballesteros, A.J., Cavique, L., Fong, S. (2020). Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?. In: Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J., Quintián, H., Corchado, E. (eds) 14th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2019). SOCO 2019. Advances in Intelligent Systems and Computing, vol 950. Springer, Cham. https://doi.org/10.1007/978-3-030-20055-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-20055-8_24
Published: 01 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20054-1
Online ISBN: 978-3-030-20055-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics