Abstract
The amount of data available in any field is permanently increasing, including high dimensionalities in the datasets that describe them. This high dimensionality makes the treatment of a dataset more complicated since algorithms require complex internal processes. To address the problem of dimensionality reduction, multiple Feature Selection techniques have been developed. However, most of these techniques just offer as result an ordered list of features according to their relevance (ranking), but they do not indicate which one is the optimal feature subset for representing the data. Therefore, it is necessary to design additional strategies for finding this best feature subset. This paper proposes a novel criterion based on sequential search methods to choose feature subsets automatically, without having to exhaustively evaluate rankings derived from filter selectors. The experimental results on 27 real datasets, applying eight selectors and six classifiers for evaluating their results, show that the best feature subset are reached.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sánchez-Maroño, N., Alonso-Betanzos, A., Tombilla-Sanromán, M.: Filter methods for feature selection – a comparative study. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 178–187. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77226-2_19
Belanche, L.A., Gonzalez, F.F.: Review and evaluation of feature selection algorithms in synthetic problems. Technical report. Universitat Politecnica de Catalunya, Barcelona (2011). http://mawi.wide.ad.jp/mawi/
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutmann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009). http://www.cs.waikato.ac.nz/ml/weka/
Sadeghi, R., Zarkami, R., Sabetraftar, K., Van Damme, P.: Application of genetic algorithm and greedy stepwise to select input variables in classification tree models for the prediction of habitat requirements of Azolla filiculoides (Lam) in Anzali wetland, Iran. Ecol. Model. 215, 44–53 (2013)
Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15, 1119–1125 (1994)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19, 61–74 (1993)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, vol. 18, pp. 507–514. MIT Press, Cambridge (2005)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–91 (1993)
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Tenth National Conference on Artificial Intelligence, pp. 129–134. MIT Press, San Jose (1992)
Vapnik, V., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 774–780 (1963)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C. Press Syndicate of the University of Cambridge, Cambridge (1988)
Singh, R., Kumar, H., Singla, R.K.: Analysis of feature selection techniques for network traffic dataset. In: IEEE (eds.) International Conference on Machine Intelligence and Research Advancement, pp. 42–46, Katra, India (2013)
Platt, J.C.: Sequential minimal optimization: a fast algorithm for training support vector machines. Technical report. Microsoft Co. (1998)
Titterington, D.M., Murray, G.D., Murray, L.S., Spiegelhalter, D.J., Skene, A.M., Habbema, J.D.F., Gelpke, G.J.: Comparison of discrimination techniques applied to a complex dataset of head injured. J. Roy. Stat. Soc. Ser. A 144, 145–175 (1981)
Fix, E., Hodges Jr., J.L.: Discriminatory analysis nonparametric discrimination consistency properties, Project number 21-49-004. University of California, Berkeley (1951)
Cleary, J.G., Trigg, L.E.: K*: an instance-based learner using an entropic distance measure. In: 12th International Conference on Machine Learning, pp. 108–114. University of Waikato, New Zealand (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Vargas-Ruíz, L., Franco-Arcega, A., Alonso-Lavernia, MdlÁ. (2018). A Novel Criterion to Obtain the Best Feature Subset from Filter Ranking Methods. In: Martínez-Trinidad, J., Carrasco-Ochoa, J., Olvera-López, J., Sarkar, S. (eds) Pattern Recognition. MCPR 2018. Lecture Notes in Computer Science(), vol 10880. Springer, Cham. https://doi.org/10.1007/978-3-319-92198-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-92198-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92197-6
Online ISBN: 978-3-319-92198-3
eBook Packages: Computer ScienceComputer Science (R0)