skip to main content
10.1145/3071178.3071330acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
Public Access

Improving generalization of evolved programs through automatic simplification

Published: 01 July 2017 Publication History


Programs evolved by genetic programming unfortunately often do not generalize to unseen data. Reliable synthesis of programs that generalize to unseen data is therefore an important open problem. We present evidence that smaller programs evolved using the PushGP system tend to generalize better over a range of program synthesis problems. Like in many genetic programming systems, programs evolved by PushGP usually have pieces that can be removed without changing the behavior of the program. We describe methods for automatically simplifying evolved programs to make them smaller and potentially improve their generalization. We present five simplification methods and analyze their strengths and weaknesses on a suite of general program synthesis benchmark problems. All of our methods use a straightforward hill-climbing procedure to remove pieces of a program while ensuring that the resulting program gives the same errors on the training data as did the original program. We show that automatic simplification, previously used both for post-run analysis and as a genetic operator, can significantly improve the generalization rates of evolved programs.


Alexandras Agapitos, Anthony Brabazon, and Michael O'Neill. 2012. Controlling Overfitting in Symbolic Regression Based on a Bias/Variance Error Decomposition. In Parallel Problem Solving from Nature, PPSN XII (part 1) (Lecture Notes in Computer Science), Vol. 7491. Springer, Taormina, Italy, 438--447.
R. Muhammad Atif Azad and Conor Ryan. 2011. Variance based selection to improve test set performance in genetic programming. In GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation. ACM, Dublin, Ireland, 1315--1322.
Markus Brameier and Wolfgang Banzhaf. 2001. A Comparison of Linear Genetic Programming and Neural Networks in Medical Data Mining. IEEE Transactions on Evolutionary Computation 5, 1 (Feb. 2001), 17--26.
Mauro Castelli, Luca Manzoni, Sara Silva, and Leonardo Vanneschi. 2010. A comparison of the generalization ability of different genetic programming frameworks. In IEEE Congress on Evolutionary Computation (CEC 2010). IEEE Press, Barcelona, Spain.
Pedro Domingos. 2016. Master Algorithm. Penguin Books.
Aniko Ekart. 2000. Shorter Fitness Preserving Genetic Programs. In Artificial Evolution. 4th European Conference, AE'99, Selected Papers (LNCS), C. Fonlupt, J.-K. Hao, E. Lutton, E. Ronald, and M. Schoenauer (Eds.), Vol. 1829. Dunkerque, France, 73--83.
Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2017. A Grammar Design Pattern for Arbitrary Program Synthesis Problems in Genetic Programming. In 20th European Conference on Genetic Programming. In press.
Ashley George and Malcolm I. Heywood. 2006. Improving GP classifier generalization using a cluster separation metric. In GECCO 2006: Proceedings of the 8th annual conference on Genetic and evolutionary computation, Vol. 1. ACM Press, Seattle, Washington, USA, 939--940.
Ivo Goncalves and Sara Silva. 2013. Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training data. In Proceedings of the 16th European Conference on Genetic Programming, EuroGP 2013 (LNCS), Vol. 7831. Springer Verlag, Vienna, Austria, 73--84.
Ivo Goncalves, Sara Silva, and Carlos M. Fonseca. 2015. On the Generalization Ability of Geometric Semantic Genetic Programming. In 18th European Conference on Genetic Programming (LNCS), Vol. 9025. Springer, Copenhagen, 41--52.
Thomas Helmuth. 2015. General Program Synthesis from Examples Using Genetic Programming with Parent Selection Based on Random Lexicographic Orderings of Test Cases. Ph.D. dissertation. University of Massachusetts, Amherst,
Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2015. Lexicase Selection For Program Synthesis: A Diversity Analysis. In Genetic Programming Theory and Practice XIII (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA.
Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2016. The Impact of Hyperselection on Lexicase Selection. In GECCO '16: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, Tobias Friedrich (Ed.). ACM, Denver, USA, 717--724.
Thomas Helmuth and Lee Spector. 2015. General Program Synthesis Benchmark Suite. In GECCO '15: Proceedings of the 2015 on Genetic and Evolutionary Computation Conference. ACM, Madrid, Spain, 1039--1046.
Thomas Helmuth, Lee Spector, Nicholas Freitag McPhee, and Saul Shanabrook. 2016. Linear Genomes for Structured Programs. In Genetic Programming Theory and Practice XIV (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA.
M. Hollander and D.A. Wolfe. 1999. Nonparametric Statistical Methods. Wiley.
Dale Hooper and Nicholas S. Flann. 1996. Improving the Accuracy and Robustness of Genetic Programming through Expression Simplification. In Genetic Programming 1996: Proceedings of the First Annual Conference. MIT Press, Stanford University, CA, USA, 428.
Hitoshi Iba, Hugo De Garis, and Taisuke Sato. 1994. Genetic programming using a minimum description length principle. Advances in genetic programming 1 (1994), 265--284.
David Kinzett, Mengjie Zhang, and Mark Johnston. 2010. Investigation of simplification threshold and noise level of input data in numerical simplification of genetic programs. In IEEE Congress on Evolutionary Computation (CEC 2010). IEEE Press, Barcelona, Spain.
John R. Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA.
William La Cava and Lee Spector. 2014. Inheritable Epigenetics in Genetic Programming. In Genetic Programming Theory and Practice XII (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA, 37--51.
Nicholas Freitag McPhee, Mitchell Finzel, Maggie M. Casale, Thomas Helmuth, and Lee Spector. 2016. A detailed analysis of a PushGP run. In Genetic Programming Theory and Practice XIV (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA.
Alberto Moraglio, Krzysztof Krawiec, and Colin G. Johnson. 2012. Geometric Semantic Genetic Programming. In Parallel Problem Solving from Nature, PPSN XII (part 1) (Lecture Notes in Computer Science), Vol. 7491. Springer, Taormina, Italy, 21--31.
Riccardo Poli, William B. Langdon, and Nicholas Freitag McPhee. 2008. A field guide to genetic programming. Published via and freely available at (With contributions by J. R. Koza).
Alan Robinson. 2001. Genetic Programming: Theory, Implementation, and the Evolution of Unconstrained Solutions. Division III thesis. Hampshire College.
Justinian Rosca. 1996. Generality Versus Size in Genetic Programming. In Genetic Programming 1996: Proceedings of the First Annual Conference. MIT Press, Stanford University, CA, USA, 381--387.
Sara Silva, Stephen Dignum, and Leonardo Vanneschi. 2012. Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genetic Programming and Evolvable Machines 13, 2 (2012), 197--238.
Lee Spector. 2001. Autoconstructive Evolution: Push, PushGP, and Push-pop. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001). Morgan Kaufmann, San Francisco, California, USA, 137--146.
Lee Spector and Thomas Helmuth. 2014. Effective simplification of evolved push programs using a simple, stochastic hill-climber. In GECCO Comp '14: Proceedings of the 2014 conference companion on Genetic and evolutionary computation companion. ACM, Vancouver, BC, Canada, 147--148.
Lee Spector, Jon Klein, and Maarten Keijzer. 2005. The Push3 execution stack and the evolution of control. In GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation. ACM Press, Washington DC, USA, 1689--1696.
Lee Spector and Alan Robinson. 2002. Genetic Programming and Autoconstructive Evolution with the Push Programming Language. Genetic Programming and Evolvable Machines 3, 1 (March 2002), 7--40.
Leonardo Vanneschi, Mauro Castelli, and Sara Silva. 2010. Measuring bloat, overfitting and functional complexity in genetic programming. In GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computation. ACM, Portland, Oregon, USA, 877--884.
Leonardo Vanneschi and Steven Gustafson. 2009. Using crossover based similarity measure to improve genetic programming generalization ability. In GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation. ACM, Montreal, 1139--1146.
Phillip Wong and Mengjie Zhang. 2006. Algebraic simplification of GP programs during evolution. In GECCO 2006: Proceedings of the 8th annual conference on Genetic and evolutionary computation, Vol. 1. ACM Press, Seattle, Washington, USA, 927--934.
Haoxi Zhan. 2014. A quantitative analysis of the simplification genetic operator. In GECCO 2014 student workshop, Tea Tusar and Boris Naujoks (Eds.). ACM, Vancouver, BC, Canada, 1077--1080.
Byoung-Tak Zhang and Heinz Mühlenbein. 1995. Balancing accuracy and parsimony in genetic programming. Evolutionary Computation 3, 1 (1995), 17--38.

Cited By

View all
  • (2025)Code Building Genetic Programming is Faster than PushGPGenetic Programming Theory and Practice XXI10.1007/978-981-96-0077-9_7(133-150)Online publication date: 28-Feb-2025
  • (2024)Inexact Simplification of Symbolic Regression Expressions with Locality-sensitive HashingProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654147(896-904)Online publication date: 14-Jul-2024
  • (2024)On the Nature of the Phenotype in Tree Genetic ProgrammingProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654129(868-877)Online publication date: 14-Jul-2024
  • Show More Cited By

Index Terms

  1. Improving generalization of evolved programs through automatic simplification



    Information & Contributors


    Published In

    cover image ACM Conferences
    GECCO '17: Proceedings of the Genetic and Evolutionary Computation Conference
    July 2017
    1427 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 July 2017


    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic simplification
    2. generalization
    3. genetic programming
    4. overfitting
    5. push


    • Research-article

    Funding Sources


    GECCO '17

    Acceptance Rates

    GECCO '17 Paper Acceptance Rate 178 of 462 submissions, 39%;
    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)118
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 07 Mar 2025

    Other Metrics


    Cited By

    View all
    • (2025)Code Building Genetic Programming is Faster than PushGPGenetic Programming Theory and Practice XXI10.1007/978-981-96-0077-9_7(133-150)Online publication date: 28-Feb-2025
    • (2024)Inexact Simplification of Symbolic Regression Expressions with Locality-sensitive HashingProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654147(896-904)Online publication date: 14-Jul-2024
    • (2024)On the Nature of the Phenotype in Tree Genetic ProgrammingProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654129(868-877)Online publication date: 14-Jul-2024
    • (2024)Enhancing Intercropping Yield Predictability Using Optimally Driven Feedback Neural Network and Loss FunctionsIEEE Access10.1109/ACCESS.2024.348610112(162769-162787)Online publication date: 2024
    • (2024)The Impact of Step Limits on Generalization and Stability in Software SynthesisGenetic Programming Theory and Practice XX10.1007/978-981-99-8413-8_5(87-104)Online publication date: 18-Feb-2024
    • (2024)Going Bananas! - Unfolding Program Synthesis with OrigamiIntelligent Systems10.1007/978-3-031-79032-4_1(3-18)Online publication date: 17-Nov-2024
    • (2024)Generational Computation Reduction in Informal Counterexample-Driven Genetic ProgrammingGenetic Programming10.1007/978-3-031-56957-9_2(21-37)Online publication date: 3-Apr-2024
    • (2023)Human-Driven Genetic Programming for Program Synthesis: A PrototypeProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3596373(1981-1989)Online publication date: 15-Jul-2023
    • (2023)HOTGP - Higher-Order Typed Genetic ProgrammingProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590464(1091-1099)Online publication date: 15-Jul-2023
    • (2023)Explainable Artificial Intelligence by Genetic Programming: A SurveyIEEE Transactions on Evolutionary Computation10.1109/TEVC.2022.322550927:3(621-641)Online publication date: Jun-2023
    • Show More Cited By

    View Options

    View options


    View or Download as a PDF file.



    View online with eReader.


    Login options






    Share this Publication link

    Share on social media