Abstract
This paper tackles problems where attribute selection is not only able to choose a few features but also to achieve a low performance classification in terms of accuracy compared to the full attribute set. Correlation-based feature selection (CFS) has been set as the baseline attribute subset selector due to its popularity and high performance. Around hundred data sets have been collected and submitted to CFS; then the problems fulfilling simultaneously the conditions: (a) a number of selected attributes lower than six and (b) a percentage of selected attributes lower than a forty per cent, have been tested onto two directions. Firstly, in the scope of data selection at the feature level, an advanced contemporary approach have been conducted as well as some options proposed in a prior work. Secondly, the pre-processed and initial problems have been tested with some sturdy classifiers. Moreover, this work introduces a new taxonomy of feature selection according to the solution type and the followed way to compute it. The test bed comprises seven problems featured by a low dimensionality after the CFS application, three out of them report a single selected attribute, another one with two extracted features and the three remaining data sets with four or five retained attributes; additionally, the initial feature set is between six and twenty-nine and the complexity of the problems, in terms of classes, fluctuates between two and twenty-one, throwing averages of sixteen and around five for both aforementioned properties. The contribution concluded that the advanced procedure (extended CFS) is suitable for problems where only one or two attributes are selected by CFS; for data sets with more than two selected features the baseline method is preferable to the advanced one, although the considered feature ranking method achieved intermediate results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Intelligent assistance for data pre-processing. Comput. Stand. Interfaces 57, 101–109 (2018)
Chen, Y., Li, Y., Cheng, X.-Q., Guo, L.: Survey and taxonomy of feature selection algorithms in intrusion detection system. In: International Conference on Information Security and Cryptology, pp. 153–167. Springer, Heidelberg (2006)
Cho, S.-B., Tallón-Ballesteros, A.J.: Visual tools to lecture data analytics and engineering. In: International Work-Conference on the Interplay Between Natural and Artificial Computation, pp. 551–558. Springer, Heidelberg (2017)
Cover, T., Hart, P.: Nearest neighbor pattern classification. Inf. Theor. IEEE Transact. 13(1), 21–27 (1967)
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (eds.) Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann (1998)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D thesis, University of Waikato, Hamilton, New Zealand (1999)
Miao, J., Niu, L.: A survey on feature selection. Proc. Comput. Sci. 91, 919–926 (2016)
Quinlan, J.R.: C4.5: Programs for Machine Learning, vol. 1. Morgan kaufmann, Burlington (1993)
Shapiro, A., Niblett, T.: Automatic induction of classification rules for a chess endgame. In: Advances in Computer Chess, pp. 73–92. Elsevier (1982)
Tallón-Ballesteros, A.J., Correia, L., Xue, B.: Featuring the attributes in supervised machine learning. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 350–362. Springer (2018)
Tallón-Ballesteros, A.J., Riquelme, J.C.: Low dimensionality or same subsets as a result of feature selection: an in-depth roadmap. In: International Work-Conference on the Interplay Between Natural and Artificial Computation, pp. 531–539. Springer (2017)
Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Merging subsets of attributes to improve a hybrid consistency-based filter: a case of study in product unit neural networks. Connect. Sci. 28(3), 242–257 (2016)
Tan, P.-N.: Introduction to Data Mining. Pearson Education, India (2018)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39. Citeseer (2000)
Acknowledgments
This work has been partially subsidised by TIN2014-55894-C2-R and TIN2017-88209-C2-2-R projects of the Spanish Inter-Ministerial Commission of Science and Technology (MICYT), FEDER funds and the P11-TIC-7528 project of the “Junta de Andalucía” (Spain).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Tallón-Ballesteros, A.J., Cavique, L., Fong, S. (2020). Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?. In: Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J., Quintián, H., Corchado, E. (eds) 14th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2019). SOCO 2019. Advances in Intelligent Systems and Computing, vol 950. Springer, Cham. https://doi.org/10.1007/978-3-030-20055-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-20055-8_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20054-1
Online ISBN: 978-3-030-20055-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)