Skip to main content

Random Sampling Technique for Overfitting Control in Genetic Programming

  • Conference paper
Genetic Programming (EuroGP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7244))

Included in the following conference series:

Abstract

One of the areas of Genetic Programming (GP) that, in comparison to other Machine Learning methods, has seen fewer research efforts is that of generalization. Generalization is the ability of a solution to perform well on unseen cases. It is one of the most important goals of any Machine Learning method, although in GP only recently has this issue started to receive more attention. In this work we perform a comparative analysis of a particularly interesting configuration of the Random Sampling Technique (RST) against the Standard GP approach. Experiments are conducted on three multidimensional symbolic regression real world datasets, the first two on the pharmacokinetics domain and the third one on the forestry domain. The results show that the RST decreases overfitting on all datasets. This technique also improves testing fitness on two of the three datasets. Furthermore, it does so while producing considerably smaller and less complex solutions. We discuss the possible reasons for the good performance of the RST, as well as its possible limitations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming (With contributions by J.R. Koza) (2008), http://lulu.com , http://www.gp-field-guide.org.uk

  2. O’Neill, M., Vanneschi, L., Gustafson, S., Banzhaf, W.: Open Issues in Genetic Programming. Genetic Programming and Evolvable Machines 11, 339–363 (2010)

    Article  Google Scholar 

  3. Koza, J.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press (1992)

    Google Scholar 

  4. Kushchu, I.: An Evaluation of Evolutionary Generalisation in Genetic Programming. Artificial Intelligence Review 18, 3–14 (2002)

    Article  MATH  Google Scholar 

  5. Silva, S., Costa, E.: Dynamic Limits for Bloat Control in Genetic Programming and a review of past and current bloat theories. Genetic Programming and Evolvable Machines 10(2), 141–179 (2009)

    Article  MathSciNet  Google Scholar 

  6. Vanneschi, L., Silva, S.: Using Operator Equalisation for Prediction of Drug Toxicity with Genetic Programming. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 65–76. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  7. Becker, L.A., Seshadri, M.: Comprehensibility and Overfitting Avoidance in Genetic Programming for Technical Trading Rules. Technical report, Worcester Polytechnic Institute (2003)

    Google Scholar 

  8. Mahler, S., Robilliard, D., Fonlupt, C.: Tarpeian Bloat Control and Generalization Accuracy. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 203–214. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Gagné, C., Schoenauer, M., Parizeau, M., Tomassini, M.: Genetic Programming, Validation Sets, and Parsimony Pressure. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ekárt, A. (eds.) EuroGP 2006. LNCS, vol. 3905, pp. 109–120. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Cavaretta, M.J., Chellapilla, K.: Data Mining using Genetic Programming: The implications of parsimony on generalization error. In: Proceedings of the 1999 IEEE Congress on Evolutionary Computation, pp. 1330–1337. IEEE Press (1999)

    Google Scholar 

  11. Zhang, B.-T., Mühlenbein, H.: Balancing Accuracy and Parsimony in Genetic Programming. Evolutionary Computation 3(1), 17–38 (1995)

    Article  Google Scholar 

  12. Vladislavleva, E.J., Smits, G.F., den Hertog, D.: Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming. IEEE Transactions on Evolutionary Computation 13(2), 333–349 (2009)

    Article  Google Scholar 

  13. Vanneschi, L., Castelli, M., Silva, S.: Measuring Bloat, Overfitting and Functional Complexity in Genetic Programming. In: Proceedings of GECCO 2010, pp. 877–884. ACM Press (2010)

    Google Scholar 

  14. Trujillo, L., Silva, S., Legrand, P., Vanneschi, L.: An Empirical Study of Functional Complexity as an Indicator of Overfitting in Genetic Programming. In: Silva, S., Foster, J.A., Nicolau, M., Machado, P., Giacobini, M. (eds.) EuroGP 2011. LNCS, vol. 6621, pp. 262–273. Springer, Heidelberg (2011)

    Google Scholar 

  15. Nguyen, Q.U., Nguyen, T.H., Nguyen, X.H., O’Neill, M.: Improving the Generalisation Ability of Genetic Programming with Semantic Similarity based Crossover. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds.) EuroGP 2010. LNCS, vol. 6021, pp. 184–195. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  16. Vanneschi, L., Gustafson, S.: Using Crossover Based Similarity Measure to Improve Genetic Programming Generalization Ability. In: Proceedings of GECCO 2009, pp. 1139–1146. ACM Press (2009)

    Google Scholar 

  17. Da Costa, L.E., Landry, J.-A.: Relaxed Genetic Programming. In: Proceedings of GECCO 2006, pp. 937–938. ACM Press (2006)

    Google Scholar 

  18. Chan, K.Y., Kwong, C.K., Chang, E.: Reducing Overfitting in Manufacturing Process Modeling using a Backward Elimination Based Genetic Programming. Applied Soft Computing 11(2), 1648–1656 (2011)

    Article  Google Scholar 

  19. Nikolaev, N., de Menezes, L.M., Iba, H.: Overfitting Avoidance in Genetic Programming of Polynomials. In: Proceedings of the 2002 IEEE Congress on Evolutionary Computation, pp. 1209–1214. IEEE Press (2002)

    Google Scholar 

  20. Chen, S.-H., Kuo, T.-W.: Overfitting or Poor Learning: A Critique of Current Financial Applications of GP. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 34–46. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  21. Foreman, N., Evett, M.: Preventing overfitting in GP with canary functions. In: Proceedings of GECCO 2005, pp. 1779–1780. ACM Press (2005)

    Google Scholar 

  22. Vanneschi, L., Rochat, D., Tomassini, M.: Multi-optimization improves genetic programming generalization ability. In: Proceedings of GECCO 2007, p. 1759. ACM Press (2007)

    Google Scholar 

  23. Robilliard, D., Fonlupt, C.: Backwarding: An Overfitting Control for Genetic Programming in a Remote Sensing Application. In: Collet, P., Fonlupt, C., Hao, J.-K., Lutton, E., Schoenauer, M. (eds.) EA 2001. LNCS, vol. 2310, pp. 245–254. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  24. Banzhaf, W., Francone, F.D., Nordin, P.: The Effect of Extensive Use of the Mutation Operator on Generalization in Genetic Programming using Sparse Data Sets. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN 1996. LNCS, vol. 1141, pp. 300–309. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  25. Archetti, F., Messina, E., Lanzeni, S., Vanneschi, L.: Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines 8(4), 17–26 (2007)

    Article  Google Scholar 

  26. Baccini, A., Laporte, N., Goetz, S.J., Sun, M., Dong, H.: A first map of tropical Africa’s above-ground biomass derived from satellite imagery. Environmental Research Letters 3, 045011 (2008)

    Article  Google Scholar 

  27. Lucas, R., Armston, J., Fairfax, R., Fensham, R., Accad, A., Carreiras, J., Kelley, J., Bunting, P., Clewley, D., Bray, S., Metcalfe, D., Dwyer, J., Bowen, M., Eyre, T., Laidlaw, M., Shimada, M.: An Evaluation of the ALOS PALSAR L-Band Backscatter-Above Ground Biomass Relationship Queensland, Australia: Impacts of Surface Moisture Condition and Vegetation Structure. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 3(4), 576–593 (2010)

    Article  Google Scholar 

  28. Saatchi, S.S., Harris, N.L., Brown, S., Lefsky, M., Mitchard, E.T.A., Salas, W., Zutta, B.R., Buermann, W., Lewis, S.L., Hagen, S., Petrova, S., White, L., Silman, M., Morel, A.: Benchmark map of forest carbon stocks in tropical regions across three continents. Proceedings of the National Academy of Sciences 108(24), 9899–9904 (2011)

    Article  Google Scholar 

  29. Gathercole, C., Ross, P.: Dynamic Training Subset Selection for Supervised Learning in Genetic Programming. In: Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. LNCS, vol. 866, pp. 312–321. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  30. Liu, Y., Khoshgoftaar, T.: Reducing Overfitting in Genetic Programming Models for Software Quality Classification. In: Proceedings of the Eighth IEEE International Symposium on High Assurance Systems Engineering, pp. 56–65. IEEE Press (2004)

    Google Scholar 

  31. Gonçalves, I., Silva, S.: Experiments on Controlling Overfitting in Genetic Programming. In: 15th Portuguese Conference on Artificial Intelligence (to appear)

    Google Scholar 

  32. Luke, S., Panait, L.: Lexicographic parsimony pressure. In: Proceedings of GECCO 2002, pp. 829–836. Morgan Kaufmann (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gonçalves, I., Silva, S., Melo, J.B., Carreiras, J.M.B. (2012). Random Sampling Technique for Overfitting Control in Genetic Programming. In: Moraglio, A., Silva, S., Krawiec, K., Machado, P., Cotta, C. (eds) Genetic Programming. EuroGP 2012. Lecture Notes in Computer Science, vol 7244. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29139-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29139-5_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29138-8

  • Online ISBN: 978-3-642-29139-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics