skip to main content
10.1145/3377930.3390237acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Feature standardisation and coefficient optimisation for effective symbolic regression

Published: 26 June 2020 Publication History

Abstract

Symbolic regression is a common application of genetic programming where model structure and corresponding parameters are evolved in unison. In the majority of work exploring symbolic regression, features are used directly without acknowledgement of their relative scale or unit. This paper extends recent work on the importance of standardisation of features when conducting symbolic regression. Specifically, z-score standardisation of input features is applied to both inputs and response to ensure that evolution explores a model space with zero mean and unit variance. This paper demonstrates that standardisation allows a simpler function set to be used without increasing bias. Additionally, it is demonstrated that standardisation can significantly improve the performance of coefficient optimisation through gradient descent to produce accurate models. Through analysis of several benchmark data sets, we demonstrate that feature standardisation enables simple but effective approaches that are comparable in performance to the state-of-the-art in symbolic regression.

References

[1]
Leo Breiman. 2001. Random Forests. Machine learning 45, 1 (2001), 5--32.
[2]
Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and regression trees. CRC press.
[3]
Qi Chen, Bing Xue, and Mengjie Zhang. 2015. Generalisation and domain adaptation in GP with gradient descent for symbolic regression. In Evolutionary Computation (CEC), 2015 IEEE Congress on. IEEE, 1137--1144.
[4]
Qi Chen, Bing Xue, and Mengjie Zhang. 2018. Improving Generalisation of Genetic Programming for Symbolic Regression with Angle-Driven Geometric Semantic Operators. IEEE Transactions on Evolutionary Computation (2018).
[5]
Alison Cozad and Nikolaos V. Sahinidis. 2018. A global MINLP approach to symbolic regression. Mathematical Programming 170, 1 (2018), 97--119.
[6]
Grant Dick. 2014. Bloat and Generalisation in Symbolic Regression. In Simulated Evolution and Learning, Grant Dick, Will N. Browne, Peter Whigham, Mengjie Zhang, Lam Thu Bui, Hisao Ishibuchi, Yaochu Jin, Xiaodong Li, Yuhui Shi, Pramod Singh, Kay Chen Tan, and Ke Tang (Eds.). Springer International Publishing, Cham, 491--502.
[7]
Grant Dick. 2015. Improving Geometric Semantic Genetic Programming with Safe Tree Initialisation. In European Conference on Genetic Programming. Springer International Publishing, 28--40.
[8]
Grant Dick, Caitlin A. Owen, and Peter A. Whigham. 2018. Evolving bagging ensembles using a spatially-structured niching method. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, 418--425.
[9]
Grant Dick and Peter A Whigham. 2013. Controlling bloat through parsimonious elitist replacement and spatial structure. In European Conference on Genetic Programming. Springer Berlin Heidelberg, 13--24.
[10]
David Harrison and Daniel L. Rubinfeld. 1978. Hedonic housing prices and the demand for clean air. Journal of environmental economics and management 5, 1 (1978), 81--102.
[11]
Hitoshi Iba, Hugo deGaris, and Taisuke Sato. 1995. A Numerical Approach to Genetic Programming for System Identification. Evolutionary Computation 3, 4 (1995), 417--452.
[12]
Maarten Keijzer. 2003. Improving symbolic regression with interval arithmetic and linear scaling. In European Conference on Genetic Programming. Springer, 70--82.
[13]
David Kinzett, Mark Johnston, and Mengjie Zhang. 2009. Numerical simplification for bloat control and analysis of building blocks in genetic programming. Evolutionary Intelligence 2, 4 (2009), 151--168.
[14]
Michael Kommenda, Bogdan Burlacu, Gabriel Kronberger, and Michael Affenzeller. 2019. Parameter identification for symbolic regression using nonlinear least squares. Genetic Programming and Evolvable Machines (10 Dec 2019).
[15]
Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Michael Affenzeller, and Stefan Wagner. 2013. Effects of Constant Optimization by Nonlinear Least Squares Minimization in Symbolic Regression. In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO 13 Companion). Association for Computing Machinery, New York, NY, USA, 11211128.
[16]
Michael F. Korns. 2013. A baseline symbolic regression algorithm. In Genetic Programming Theory and Practice X. Springer, 117--137.
[17]
Taras Kowaliw and René Doursat. 2016. Bias-variance decomposition in Genetic Programming. Open Mathematics 14, 1 (2016), 62--80.
[18]
John R. Koza. 1989. Hierarchical Genetic Algorithms Operating on Populations of Computer Programs. In IJCAI, Vol. 89. 768--774.
[19]
John R. Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA.
[20]
Alberto Moraglio, Krzysztof Krawiec, and Colin G. Johnson. 2012. Geometric semantic genetic programming. In International Conference on Parallel Problem Solving from Nature. Springer, 21--31.
[21]
Patryk Orzechowski, William La Cava, and Jason H. Moore. 2018. Where are we now? A large benchmark study of recent symbolic regression methods. arXiv preprint arXiv:1804.09331 (2018).
[22]
Caitlin A. Owen, Grant Dick, and Peter A. Whigham. 2018. Feature Standardisation in Symbolic Regression. In Australasian Joint Conference on Artificial Intelligence. Springer, 565--576.
[23]
J. Ross Quinlan. 1993. Combining instance-based and model-based learning. In Proceedings of the Tenth International Conference on Machine Learning. 236--243.
[24]
Louis B. Rall. 1981. Automatic differentiation: Techniques and applications. (1981).
[25]
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. nature 323, 6088 (1986), 533.
[26]
Alexander Topchy and William F. Punch. 2001. Faster genetic programming based on local gradient search of numeric leaf values. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation. Morgan Kaufmann Publishers Inc., 155--162.
[27]
Peter A Whigham and Grant Dick. 2010. Implicitly controlling bloat in genetic programming. IEEE Transactions on Evolutionary Computation 14, 2 (2010), 173--190.
[28]
Peter A. Whigham and Johannes Keukelaar. 2001. Evolving structure-optimising content. In Evolutionary Computation, 2001. Proceedings of the 2001 Congress on, Vol. 2. IEEE, 1228--1235.
[29]
David R. White, James McDermott, Mauro Castelli, Luca Manzoni, Brian W. Goldman, Gabriel Kronberger, Wojciech Jaśkowski, Una-May O'Reilly, and Sean Luke. 2013. Better GP benchmarks: community survey results and proposals. Genetic Programming and Evolvable Machines 14, 1 (2013), 3--29.
[30]
Tony Worm and Kenneth Chiu. 2013. Prioritized grammar enumeration: symbolic regression by dynamic programming. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. ACM, 1021--1028.
[31]
I-Cheng Yeh. 1998. Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research 28, 12 (1998), 1797--1808.

Cited By

View all
  • (2024)Mapping Seasonal Spatiotemporal Dynamics of Alpine Grassland Forage Phosphorus Using Sentinel-2 MSI and a DRL-GP-Based Symbolic Regression AlgorithmRemote Sensing10.3390/rs1621408616:21(4086)Online publication date: 1-Nov-2024
  • (2024)Using Decomposed Error for Reproducing Implicit Understanding of AlgorithmsEvolutionary Computation10.1162/evco_a_0032132:1(49-68)Online publication date: 1-Mar-2024
  • (2024)Characterising the Double Descent of Symbolic RegressionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664176(2050-2057)Online publication date: 14-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference
June 2020
1349 pages
ISBN:9781450371285
DOI:10.1145/3377930
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature standardisation
  2. genetic programming
  3. gradient descent
  4. symbolic regression

Qualifiers

  • Research-article

Conference

GECCO '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)9
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Mapping Seasonal Spatiotemporal Dynamics of Alpine Grassland Forage Phosphorus Using Sentinel-2 MSI and a DRL-GP-Based Symbolic Regression AlgorithmRemote Sensing10.3390/rs1621408616:21(4086)Online publication date: 1-Nov-2024
  • (2024)Using Decomposed Error for Reproducing Implicit Understanding of AlgorithmsEvolutionary Computation10.1162/evco_a_0032132:1(49-68)Online publication date: 1-Mar-2024
  • (2024)Characterising the Double Descent of Symbolic RegressionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664176(2050-2057)Online publication date: 14-Jul-2024
  • (2024)Function Class Learning with Genetic Programming: Towards Explainable Meta Learning for Tumor Growth FunctionalsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654145(1354-1362)Online publication date: 14-Jul-2024
  • (2024)Effects of Reducing Redundant Parameters in Parameter Optimization for Symbolic Regression using Genetic ProgrammingJournal of Symbolic Computation10.1016/j.jsc.2024.102413(102413)Online publication date: Dec-2024
  • (2024)Revisiting Bagging for Stochastic AlgorithmsAI 2024: Advances in Artificial Intelligence10.1007/978-981-96-0351-0_12(162-173)Online publication date: 20-Nov-2024
  • (2023)Deep generative symbolic regression with Monte-Carlo-tree-searchProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619047(15655-15668)Online publication date: 23-Jul-2023
  • (2023)GECCO'2022 Symbolic Regression Competition: Post-Analysis of the Operon FrameworkProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3596390(2412-2419)Online publication date: 15-Jul-2023
  • (2023)Mini-Batching, Gradient-Clipping, First- versus Second-Order: What Works in Gradient-Based Coefficient Optimisation for Symbolic Regression?Proceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590368(1127-1136)Online publication date: 15-Jul-2023
  • (2022)Genetic programming, standardisation, and stochastic gradient descent revisitedProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3520304.3534040(2265-2273)Online publication date: 9-Jul-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media