Improving GP generalization: a variance-based layered learning approach

Amir Haeri, Maryam; Ebadzadeh, Mohammad Mehdi; Folino, Gianluigi

doi:10.1007/s10710-014-9220-6

Improving GP generalization: a variance-based layered learning approach

Published: 08 May 2014

Volume 16, pages 27–55, (2015)
Cite this article

Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Maryam Amir Haeri¹,
Mohammad Mehdi Ebadzadeh¹ &
Gianluigi Folino²

404 Accesses
12 Citations
Explore all metrics

Abstract

This paper introduces a new method that improves the generalization ability of genetic programming (GP) for symbolic regression problems, named variance-based layered learning GP. In this approach, several datasets, called primitive training sets, are derived from the original training data. They are generated from less complex to more complex, for a suitable complexity measure. The last primitive dataset is still less complex than the original training set. The approach decomposes the evolution process into several hierarchical layers. The first layer of the evolution starts using the least complex (smoothest) primitive training set. In the next layers, more complex primitive sets are given to the GP engine. Finally, the original training data is given to the algorithm. We use the variance of the output values of a function as a measure of the functional complexity. This measure is utilized in order to generate smoother training data, and controlling the functional complexity of the solutions to reduce the overfitting. The experiments, conducted on four real-world and three artificial symbolic regression problems, demonstrate that the approach enhances the generalization ability of the GP, and reduces the complexity of the obtained solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Choosing function sets with better generalisation performance for symbolic regression models

Article 12 May 2020

Lifetime Adaptation in Genetic Programming for the Symbolic Regression

Genetic programming with separability detection for symbolic regression

Article Open access 04 January 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

F. Archetti, S. Lanzeni, E. Messina, L. Vanneschi. Genetic programming for human oral bioavailability of drugs, in Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, GECCO ’06, New York, NY, USA (ACM, New York, 2006), pp. 255–262
A. Ashour, L. Alvarez, V. Toropov, Empirical modelling of shear strength of rc deep beams by genetic programming. Comput. Struct. 81(5), 331–338 (2003)
Article Google Scholar
R. Azad, C. Ryan. Variance based selection to improve test set performance in genetic programming, in Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (ACM, New York, 2011), pp. 1315–1322
V. Babovic, M. Keijzer, Genetic programming as a model induction engine. J. Hydroinform. 2(1), 35–60 (2000)
Google Scholar
L. Breiman, Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
M. Castelli, L. Manzoni, S. Silva, L. Vanneschi. A comparison of the generalization ability of different genetic programming frameworks, in IEEE Congress on Evolutionary Computation (CEC), 2010 (IEEE 2010), pp. 1–8
M. Castelli, L. Manzoni, S. Silva, L. Vanneschi. A quantitative study of learning and generalization in genetic programming, in Genetic Programming (Springer, Berlin, 2011), pp. 25–36
D. Costelloe, C. Ryan. On improving generalisation in genetic programming, in Genetic Programming (Springer, Berlin, 2009), pp. 61–72
C. Gagné, M. Schoenauer, M. Parizeau, M. Tomassini. Genetic programming, validation sets, and parsimony pressure, in Genetic Programming (Springer, Berlin, 2006), pp. 109–120
I. Gonçalves, S. Silva, J. B. Melo, J. M. Carreiras. Random sampling technique for overfitting control in genetic programming, in Genetic Programming (Springer, Berlin, 2012), pp. 218–229
G.J. Gray, D.J. Murray-Smith, Y. Li, K.C. Sharman, T. Weinbrenner, Nonlinear model structure identification using genetic programming. Control Eng. Pract. 6(11), 1341–1352 (1998)
Article Google Scholar
S.M. Gustafson, W.H. Hsu. Layered learning in genetic programming for a cooperative robot soccer problem, in Proceedings of the 4th European Conference on Genetic Programming, EuroGP ’01, London, UK (Springer, Berlin, 2001), pp. 291–301
N. Hien, N. Hoai, B. McKay. A study on genetic programming with layered learning and incremental sampling, in IEEE Congress on Evolutionary Computation (CEC), 2011 (IEEE, 2011), pp. 1179–1185
N.T. Hien, X.H. Nguyen. Learning in stages: a layered learning approach for genetic programming, in RIVF (2012), pp. 1–4
G.S. Hornby. Alps: the age-layered population structure for reducing the problem of premature convergence, in Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, GECCO ’06, New York, NY, USA (ACM, New York, 2006), pp. 815–822
G.S. Hornby. A steady-state version of the age-layered population structure EA, in Genetic Programming Theory and Practice VII (Springer, Berlin, 2010), pp. 87–102
J. Hu, E. Goodman, K. Seo, Z. Fan, R. Rosenberg, The hierarchical fair competition (hfc) framework for sustainable evolutionary algorithms. Evol. Comput. 13(2), 241–277 (2005)
Article Google Scholar
M. Keijzer, V. Babovic. Genetic programming, ensemble methods and the bias/variance tradeoff - introductory investigations, in Proceedings of the European Conference on Genetic Programming, London, UK (Springer, Berlin, 2000), pp. 76–90
M. Keijzer, C. Ryan, G. Murphy, M. Cattolico. Undirected training of run transferable libraries, in Genetic Programming (Springer, Berlin, 2005), pp. 361–370
J.R. Koza, Genetic Programming: Vol. 1, On the Programming of Computers by Means of Natural Selection, vol. 1 (MIT press, Cambridge, 1992)
Google Scholar
J. McDermott, D.R. White, S. Luke, L. Manzoni, M. Castelli, L. Vanneschi, W. Jaskowski, K. Krawiec, R. Harper, K. De Jong, et al. Genetic programming needs better benchmarks, in Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference (ACM, New York, 2012), pp. 791–798
T.M. Mitchell, Machine Learning (McGraw Hill series in computer science, McGraw-Hill, New York, 1997)
MATH Google Scholar
F.W. Moore, Improving means and variances of best-of-run programs in genetic programming, in Proceedings of the Ninth Midwest Artificial Intelligence and Cognitive Science Conference (MAICS-98), Russ Engineering Center, Wright State University, Dayton, Ohio, USA, 20–22 Mar, ed. by M.W. Evens (AAAI Press, Menlo Park, 1998), pp. 95–101
Google Scholar
R.H. Myers, Classical and modern regression with applications, vol. 2 (Duxbury Press, Belmont, 1990)
Google Scholar
N. Nikolaev, H. Iba, Regularization approach to inductive genetic programming. IEEE Trans. Evol. Comput. 5(4), 359–375 (2001)
Article Google Scholar
M. O’Neill, L. Vanneschi, S. Gustafson, W. Banzhaf, Open issues in genetic programming. Genet. Program. Evolvable Mach. 11(3–4), 339–363 (2010)
Article Google Scholar
J. Park, I.W. Sandberg, Approximation and radial-basis-function networks. Neural Comput. 5(2), 305–316 (1993)
Article Google Scholar
T.J. Rivlin, The Chebyshev Polynomials (Wiley, USA, 1974)
MATH Google Scholar
R. Salustowicz, J. Schmidhuber, Probabilistic incremental program evolution. Evol. Comput. 5(2), 123–141 (1997)
Article Google Scholar
S. Silva, S. Dignum, L. Vanneschi, Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genet. Program. Evol. Mach. 13(2), 197–238 (2012)
Article Google Scholar
S. Silva, L. Vanneschi, Operator equalisation, bloat and overfitting: a study on human oral bioavailability prediction, in Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation (ACM, New York, 2009), pp. 1115–1122
S. Silva, L. Vanneschi. State-of-the-art genetic programming for predicting human oral bioavailability of drugs. Adv Bioinform 165–173 (2010)
StatLib. Statlib datasets archive. http://lib.stat.cmu.edu/datasets/. Accessed 03 July 2013
P. Stone, M.M. Veloso, Layered learning, in Proceedings of the 11th European Conference on Machine Learning, ECML ’00, London, UK (Springer, Berlin, 2000), pp. 369–381
J.A. Suykens, J. Vandewalle, Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Article MathSciNet Google Scholar
L. Trujillo, S. Silva, P. Legrand, L. Vanneschi, An empirical study of functional complexity as an indicator of overfitting in genetic programming, in Genetic Programming (2011), pp 262–273
N. Uy, N. Hien, N. Hoai, M. Oneill. Improving the generalisation ability of genetic programming with semantic similarity based crossover, in Genetic Programming. Lecture Notes in Computer Science, vol. 6021. (Springer, Berlin, 2010), pp. 184–195
N.Q. Uy, N.X. Hoai, M. Oneill, R.I. Mckay, E. Galván-lópez, Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolvable Mach. 12(2), 91–119 (2011)
Article Google Scholar
T. Van Gestel, J.A. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G. Dedene, B. De Moor, J. Vandewalle, Benchmarking least squares support vector machine classifiers. Mach. Learn. 54(1), 5–32 (2004)
Article MATH Google Scholar
L. Vanneschi, M. Castelli, S. Silva, Measuring bloat, overfitting and functional complexity in genetic programming, in Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (ACM, New York, 2010), pp. 877–884
L. Vanneschi, R. Poli, Genetic programmingintroduction, applications, theory and open issues, in Handbook of Natural Computing (Springer, Berlin, 2012), pp. 709–739
E.J. Vladislavleva, G.F. Smits, D. Den Hertog, Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Trans. Evol. Comp. 13, 333–349 (2009)
Article Google Scholar
I. C. Yeh. Concrete Compressive Strength Data Set. http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength. Accessed 03 July-2013

Download references

Author information

Authors and Affiliations

Department of Computer Engineering and Information Technology, Amirkabir University of Technology, Tehran, Iran
Maryam Amir Haeri & Mohammad Mehdi Ebadzadeh
ICAR-CNR, Rende, Italy
Gianluigi Folino

Authors

Maryam Amir Haeri
View author publications
You can also search for this author inPubMed Google Scholar
Mohammad Mehdi Ebadzadeh
View author publications
You can also search for this author inPubMed Google Scholar
Gianluigi Folino
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mohammad Mehdi Ebadzadeh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amir Haeri, M., Ebadzadeh, M.M. & Folino, G. Improving GP generalization: a variance-based layered learning approach. Genet Program Evolvable Mach 16, 27–55 (2015). https://doi.org/10.1007/s10710-014-9220-6

Download citation

Received: 20 September 2013
Revised: 08 April 2014
Published: 08 May 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s10710-014-9220-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving GP generalization: a variance-based layered learning approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Choosing function sets with better generalisation performance for symbolic regression models

Lifetime Adaptation in Genetic Programming for the Symbolic Regression

Genetic programming with separability detection for symbolic regression

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now