Abstract
This paper investigates the leveraging of a validation data set with Genetic Programming (GP) to counteract over-fitting. It considers fitness on both training and validation fitness, combined with with an early stopping mechanism to improve generalisation while significantly reducing run times.
The method is tested on six benchmark binary classification data sets. Results of this preliminary investigation suggest that the strategy can deliver equivalent or improved results on test data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Azad, R.M.A., Ryan, C.: Abstract functions and lifetime learning in genetic programming for symbolic regression. In: Branke, J., et al. (eds.) GECCO 2010: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, July 7-11, pp. 893–900. ACM, Portland (2010)
Azad, R.M.A., Ryan, C.: Variance based selection to improve test set performance in genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2011, Dublin, Ireland, July 12-16 (to appear, 2011)
Costelloe, D., Ryan, C.: On improving generalisation in genetic programming. In: Vanneschi, L., Gustafson, S., Moraglio, A., De Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 61–72. Springer, Heidelberg (2009)
Fitzgerald, J., Ryan, C.: Drawing boundaries: Using individual evolved class boundaries for binary classification problems. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2011, Dublin, Ireland, July 12-16 (to appear, 2011)
Foreman, N., Evett, M.: Preventing overfitting in GP with canary functions. In: Beyer, H.G., et al. (eds.) GECCO 2005: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, June 25-29, vol. 2, pp. 1779–1780. ACM Press, Washington DC (2005)
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
Gagné, C., Parizeau, M.: Open beagle: A new c++ evolutionary computation framework. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2002, p. 888. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Gagné, C., Schoenauer, M., Parizeau, M., Tomassini, M.: Genetic programming, validation sets, and parsimony pressure. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ekárt, A. (eds.) EuroGP 2006. LNCS, vol. 3905, pp. 109–120. Springer, Heidelberg (2006)
Kushchu, I.: Genetic programming and evolutionary generalization. IEEE Transactions on Evolutionary Computation 6(5), 431–442 (2002)
Luke, S., Panait, L.: Lexicographic parsimony pressure. In: Langdon, W.B., et al. (eds.) GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, July 9-13, pp. 829–836. Morgan Kaufmann Publishers, New York (2002)
Miller, J.F., Thomson, P.: Aspects of digital evolution: Geometry and learning. In: Sipper, M., Mange, D., Pérez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 25–35. Springer, Heidelberg (1998)
Robilliard, D., Fonlupt, C.: Backwarding: An overfitting control for genetic programming in a remote sensing application. In: Collet, P., Fonlupt, C., Hao, J.-K., Lutton, E., Schoenauer, M. (eds.) EA 2001. LNCS, vol. 2310, pp. 245–254. Springer, Heidelberg (2002)
Sarle, W.S.: Stopped training and other remedies for overfitting. In: Proceedings of the 27th Symposium on the Interface of Computing Science and Statistics, pp. 352–360 (1995)
Tuite, C., Agapitos, A., O’Neill, M., Brabazon, A.: A preliminary investigation of overfitting in evolutionary driven model induction: Implications for financial modelling. In: Di Chio, C., Brabazon, A., Di Caro, G.A., Drechsler, R., Farooq, M., Grahl, J., Greenfield, G., Prins, C., Romero, J., Squillero, G., Tarantino, E., Tettamanzi, A.G.B., Urquhart, N., Uyar, A.Ş. (eds.) EvoApplications 2011, Part II. LNCS, vol. 6625, pp. 121–130. Springer, Heidelberg (2011)
Vanneschi, L., Castelli, M., Silva, S.: Measuring bloat, overfitting and functional complexity in genetic programming. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO 2010, pp. 877–884. ACM, New York (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fitzgerald, J., Ryan, C. (2011). Validation Sets for Evolutionary Curtailment with Improved Generalisation. In: Lee, G., Howard, D., Ślęzak, D. (eds) Convergence and Hybrid Information Technology. ICHIT 2011. Lecture Notes in Computer Science, vol 6935. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24082-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-24082-9_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24081-2
Online ISBN: 978-3-642-24082-9
eBook Packages: Computer ScienceComputer Science (R0)