Abstract
In Wrapper based feature selection, the more states that are visited during the search phase of the algorithm the greater the likelihood of finding a feature subset that has a high internal accuracy while generalizing poorly. When this occurs, we say that the algorithm has overfitted to the training data. We outline a set of experiments to show this and we introduce a modified genetic algorithm to address this overfitting problem by stopping the search before overfitting occurs. This new algorithm called GAWES (Genetic Algorithm With Early Stopping) reduces the level of overfitting and yields feature subsets that have a better generalization accuracy.
This research was funded by Science Foundation Ireland Grant No. SFI-02 IN. 1I111
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kohavi, R., John, G., Wrappers for feature subset selection. Artificial Intelligence, Vol. 97, No. 1–2, pp273–324, 1997
Reunanen, J.. Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research, Vol. 3, pp371–1382, 2003
Jain, A., Zongker, D., Feature Selection: Evaluation, Application and Small Sample Performance. IEEE Transactions on Pattern analysis and machine intelligence, VOL 19, NO. 2 1997
Kohavi, R., Sommerfield, D., Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology. First International Conference on Knowledge Discovery and Data Mining (KDD-95)
Yang, J., Honavar, V., Feature Subset Selection using a genetic algorithm. H. Liu and H. Motoda (Eds), Feature Extraction, Construction and Selection: A Data Mining Perspective, pp. 117–136. Massachusetts: Kluwer Academic Publishers
J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993
Fausett, L: Fundamentals of Neural Networks: architectures, algorithms, and applications. Prentice-Hall, 1994
Cunningham P. Overfitting and Diversity in Classification Ensembles based on Feature Selection. Department of Computer Science, Trinity College Dublin — Technical Report No. TCD-CS-2000-07.
Caruana, R. Lawrence, S. Giles, L. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping, Neural Information Processing Systems, Denver, Colorado. 2000
Kohavi, R., Langley, P., Yun, Y. The utility of feature weighting in nearest-neighbor algorithms. In Proceedings of the European Conference on Machine Learning (ECML-97), 1997
Mitchell, M. An Introduction to Genetic Algorithms. MIT Press, 1998
Koistinen, P., Holmstrom, L. Kernel regression and backpropagation training with noise. In J. E. Moody, S. J. Hanson, and R. P. Lippman, editors, Advances in Neural Information Processing Systems 4, pages 1033–1039. Morgan Kaufmann Publishers, San Mateo, CA, 1992
Doyle, D., Loughrey, J., Nugent, C., Coyle, L., Cunningham, P., FIONN: A Framework for Developing CBR Systems, to appear in Expert Update
Prechelt, L. Automatic Early Stopping Using Cross Validation: Quantifying the criteria. Neural Networks 1998
Aha, D., Bankert, R. A Comparative Evaluation of Sequential Feature Selection Algorithms. Articial Intelligence and Statistics, D. Fisher and J. H. Lenz, New York (1996)
Doak, J. An evaluation of feature selection methods and their application to computer security (Technical Report CSE-92-18). Davis, CA. University of California, Department of Computer Science
Kirkpatrick, S., Gelatt, C. D. Jr., Vecchi, M. P. Optimization by Simulated Annealing. Science, 220(4598):671–680, 1983
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag London Limited
About this paper
Cite this paper
Loughrey, J., Cunningham, P. (2005). Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets. In: Bramer, M., Coenen, F., Allen, T. (eds) Research and Development in Intelligent Systems XXI. SGAI 2004. Springer, London. https://doi.org/10.1007/1-84628-102-4_3
Download citation
DOI: https://doi.org/10.1007/1-84628-102-4_3
Publisher Name: Springer, London
Print ISBN: 978-1-85233-907-4
Online ISBN: 978-1-84628-102-0
eBook Packages: Computer ScienceComputer Science (R0)