ABSTRACT
Underspecification happens when there are different plausible hypotheses for a training and validation data set that behave differently when evaluating outside the training domain or distribution. Symbolic regression algorithms are prone to underspecification because of the additional degree of freedom of having to specify the structural component of the regression model. When facing different likely alternatives, some algorithms use the Occam's razor principle of choosing the simplest alternative. But not only there is no guarantee that this is the correct decision but the definition of simplest in symbolic regression is also subjective. In this work we analyse the use diversity control mechanisms to help fight the underspecification problem by providing to the end-user multiple alternative models in a single execution. These alternative models can be used in a post-analysis process when the practitioner has additional knowledge. For this purpose, we implemented a fitness sharing mechanism in the Transformation-Interaction-Rational Symbolic Regression algorithm with a distance function that measures how different two models behave outside the domain of the training data. The results showed that this adaptation is capable of producing multiple alternatives with similar fitness but with distinct behavior outside this domain.
- Michael Affenzeller, Stephan M Winkler, Bogdan Burlacu, Gabriel Kronberger, Michael Kommenda, and Stefan Wagner. 2017. Dynamic observation of genotypic and phenotypic diversity for different symbolic regression gp variants. In Proceedings of the genetic and evolutionary computation conference companion. 1553--1558.Google ScholarDigital Library
- Qi Chen, Bing Xue, and Mengjie Zhang. 2020. Preserving population diversity based on transformed semantics in genetic programming for symbolic regression. IEEE Transactions on Evolutionary Computation 25, 3 (2020), 433--447.Google ScholarDigital Library
- Fabrício O de França, Guilherme P Coelho, and Fernando J Von Zuben. 2010. On the diversity mechanisms of opt-aiNet: A comparative study with fitness sharing. In IEEE Congress on Evolutionary Computation. IEEE, 1--8.Google ScholarCross Ref
- Fabrício Olivetti de França. 2022. Transformation-Interaction-Rational Representation for Symbolic Regression. In Proceedings of the Genetic and Evolutionary Computation Conference (Boston, Massachusetts) (GECCO '22). Association for Computing Machinery, New York, NY, USA, 920--928. Google ScholarDigital Library
- Kalyanmoy Deb and David E Goldberg. 1989. An investigation of niche and species formation in genetic function optimization. In Proceedings of the third international conference on Genetic algorithms. 42--50.Google ScholarDigital Library
- Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D Hoffman, et al. 2020. Underspecification presents challenges for credibility in modern machine learning. Journal of Machine Learning Research (2020).Google Scholar
- Anikó Ekárt and Sándor Zoltán Németh. 2000. A metric for genetic programs and fitness sharing. In European Conference on Genetic Programming. Springer, 259--270.Google ScholarCross Ref
- Jonathan Kelly, Erik Hemberg, and Una-May O'Reilly. 2019. Improving genetic programming with novel exploration-exploitation control. In European Conference on Genetic Programming. Springer, 64--80.Google ScholarDigital Library
- Gabriel Kronberger, Fabricio Olivetti de França, Bogdan Burlacu, Christian Haider, and Michael Kommenda. 2022. Shape-Constrained Symbolic Regression---Improving Extrapolation with Prior Knowledge. Evolutionary Computation 30, 1 (2022), 75--98.Google ScholarCross Ref
- Samir W Mahfoud. 1995. Niching methods for genetic algorithms. University of Illinois at Urbana-Champaign.Google ScholarDigital Library
- Quang Uy Nguyen, Xuan Hoai Nguyen, Michael O'Neill, and Alexandros Agapitos. 2012. An investigation of fitness sharing with semantic and syntactic distance metrics. In European Conference on Genetic Programming. Springer, 109--120.Google ScholarDigital Library
- Ricardo Nieto-Fuentes and Carlos Segura. 2022. A replacement scheme based on dynamic penalization for controlling the diversity of the population in Genetic Programming. In 2022 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1--8.Google ScholarDigital Library
- U-M O'Reilly. 1997. Using a distance metric on genetic programs to understand genetic operators. In 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, Vol. 5. IEEE, 4092--4097.Google ScholarCross Ref
- Ludo Pagie and Paulien Hogeweg. 1997. Evolutionary Consequences of Coevolving Targets. Evolutionary Computation 5, 4 (Winter 1997), 401--418. Google ScholarDigital Library
- Guido Smits and Mark Kotanchek. 2004. Pareto-Front Exploitation in Symbolic Regression. In Genetic Programming Theory and Practice II, Una-May O'Reilly, Tina Yu, Rick L. Riolo, and Bill Worzel (Eds.). Springer, Ann Arbor, Chapter 17, 283--299. Google ScholarCross Ref
- Damien Teney, Maxime Peyrard, and Ehsan Abbasnejad. 2022. Predicting is not understanding: Recognizing and addressing underspecification in machine learning. In European Conference on Computer Vision. Springer, 458--476.Google ScholarDigital Library
- Stephan M Winkler, Michael Affenzeller, Bogdan Burlacu, Gabriel Kronberger, Michael Kommenda, and Philipp Fleck. 2018. Similarity-based analysis of population dynamics in genetic programming performing symbolic regression. In Genetic Programming Theory and Practice XIV. Springer, 1--17.Google Scholar
Index Terms
- Fighting Underspecification in Symbolic Regression with Fitness Sharing
Recommendations
Hybrid Single Node Genetic Programming for Symbolic Regression
Transactions on Computational Collective Intelligence XXIV - Volume 9770This paper presents a first step of our research on designing an effective and efficient GP-based method for symbolic regression. First, we propose three extensions of the standard Single Node GP, namely 1 a selection strategy for choosing nodes to be ...
Symbolic Regression by Grammar-based Multi-Gene Genetic Programming
GECCO Companion '15: Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary ComputationGrammatical Evolution is an algorithm of Genetic Programming but it is capable of evolving programs in an arbitrary language given by a user-provided context-free grammar. We present a way how to apply Multi-Gene idea, known from Multi-Gene Genetic ...
Scaled Symbolic Regression
Performing a linear regression on the outputs of arbitrary symbolic expressions has empirically been found to provide great benefits. Here some basic theoretical results of linear regression are reviewed on their applicability for use in symbolic ...
Comments