Evolutionary Symbolic Regression from a Probabilistic Perspective

Gong, Chi; Bryan, Jordan; Furcoiu, Alex; Su, Qichang; Grobe, Rainer

doi:10.1007/s42979-022-01094-0

Evolutionary Symbolic Regression from a Probabilistic Perspective

Original Research
Published: 04 April 2022

Volume 3, article number 209, (2022)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Chi Gong^1,2,
Jordan Bryan¹,
Alex Furcoiu¹,
Qichang Su ORCID: orcid.org/0000-0002-9456-3968¹ &
…
Rainer Grobe¹

217 Accesses
Explore all metrics

Abstract

We examine the genetic evolution-based algorithm for symbolic regression from a probabilistic dynamical perspective. This approach permits us to follow the evolution of the search candidate functions from generation to generation as they improve their fitness and finally converge to the best function that matches a given data set. In particular, we use this statistical framework to explore the optimal external parameters that govern a special mutation operator, which can systematically improve the numerical value of constants contained in each candidate formula of the search space. We then apply symbolic regression to the chaotic logistic map and the Lorenz system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Effectiveness of Genetic Operations in Symbolic Regression

Genetic Programming Symbolic Regression: What Is the Prior on the Prediction?

Evolving Simple Symbolic Regression Models by Multi-Objective Genetic Programming

References

Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT Press; 2016.
MATH Google Scholar
Sarachik ES, Cane MA. The El-Nino southern oscillation phenomena. Cambridge, UK: Cambridge University Press; 2010.
Book Google Scholar
Vladislavleva E, Friedrich T, Neumann F, Wagner M. Predicting the energy output of wind farms based on weather data: important variables and their correlation. Renew Energy. 2013;50:236.
Article Google Scholar
Fitzsimmons J, Moscato P. Symbolic regression modelling of drug responses. In: First IEEE Conference on Artificial Intelligence for Industries; 2018.
Graham MJ, Djorgovski SG, Mahabal A, Donalek C, Drake A, Longo G. Data challenges of time domain astronomy. Distr Parallel Databases. 2012;30(5):371.
Article Google Scholar
Schmidt M, Lipson H. Distilling free-form natural laws from experimental data. Science. 2009;324(5923):81.
Article Google Scholar
Udrescu SM, Tegmark M. The Feynman database for symbolic regression. https://space.mit.edu/home/tegmark/aifeynman.html; 2020
Udrescu SM, Tegmark M. AI Feynman: a physics-inspired method for symbolic regression. Sci Adv. 2019;6(16):2631.
Article Google Scholar
Durasevic M, Domagoj J, Scoczynski Ribeiro Martins M, Stjepan Picek P, Wagner M. Fitness landscape analysis of dimensionally-aware genetic programming featuring Feynman equations. arXiv:2004.12762v1 [cs.NE]; 2020.
Quade M, Abel M, Shafi K, Niven RK, Noack BR. Prediction of dynamical systems by symbolic regression. Phys Rev E. 2016;94:012214.
Article MathSciNet Google Scholar
Gautier N, Aider JL, Duriez T, Noack B, Segond M, Abel M. Closed-loop separation control using machine learning. J Fluid Mech. 2015;770:442.
Article Google Scholar
Qin H. Machine learning and serving of discrete field theories - when artificial intelligence meets the discrete universe. arXiv:1910.10147; 2019.
Gong C, Su Q, Grobe R. Machine learning techniques in the examination of the electron-positron pair creation process. J Opt Soc Am B. 2021;38:3582–91.
Article Google Scholar
Zimmermann RS, Parlitz U. Observing spatio-temporal dynamics of excitable media using reservoir computing. Chaos. 2018;28:043118.
Article Google Scholar
Tanaka G, Yamane T, HšŠroux JB, Nakane R, Kanazawa N, Takeda S, Numata H, Nakano D, Hirose A. Recent advances in physical reservoir computing: a review. Neural Netw. 2019;115:100.
Article Google Scholar
Lu Z, Hunt BR, Ott E. Attractor reconstruction by machine learning. Chaos. 2018;28:061104.
Article MathSciNet Google Scholar
Symbolic regression is a relatively young research field and there are no extensive reviews for direct applications in physics yet. Two interesting early articles are [17,18].
Vladislavleva K. Model-based problem solving through symbolic regression via Pareto genetic programming. PhD thesis, Tilburg University; 2008.
Minnebo W, Stijven S. Empowering knowledge computing with variable selection. M Sc thesis: University of Antwerp; 2011.
Bruneton JP, Cazenille L, Douin A, Reverdy V. Exploration and exploitation in symbolic regression using quality-diversity and evolutionary strategies algorithms. arXiv:1906.03959v1 [cs.NE]; 2019.
Koza JR. Genetic programming: on the programming of computers by means of natural selection. Cambridge: MIT Press; 1992.
MATH Google Scholar
Koza JR. Genetic programming. Cambridge: MIT Press; 1998.
Google Scholar
Lambora A, Gupta K, Chopra K. Genetic algorithm—a literature review. In: International conference on machine learning, big data, cloud and parallel computing (COMITCon); 2019, p 380.
Miller B, Goldberg D. Genetic algorithms, tournament selection and the effects of noise. Complex Syst. 1995;9:193.
MathSciNet Google Scholar
Blickle T, Thiele L. A comparison of selection schemes used in evolutionary algorithms. Evol Comput. 1996;4:361.
Article Google Scholar
Goldberg D, Deb K. A comparative analysis of selection schemes used in genetic algorithms. Found Genet Algor. 1991;1:69.
MathSciNet Google Scholar
Holland JH. Adaptation in natural and artificial systems. Cambridge: MIT Press; 1975.
Google Scholar
Gavrilets S. Fitness landscapes and the origin of species. Princeton: Princeton University Press; 2004.
Book Google Scholar
McCandlish DM. Visualizing fitness landscapes. Evolution. 2011;65:1544.
Article Google Scholar
Wright S. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proc Six Int Congr Genet. 1932;1:355.
Google Scholar
Richter H, Engelbrecht A. Recent advances in the theory and application of fitness landscapes. Heidelberg: Springer; 2014.
Book Google Scholar
May R. Simple mathematical models with very complicated dynamic. Nature. 1976;261:459.
Article Google Scholar
Tan JPL. Simulated extrapolated dynamics with parametrization networks. arXiv:1902.03440v1 [nlin.CD]; 2019.
Lorenz EN. Deterministic nonperiodic flow. J Atmos Sci. 1963;20(2):130.
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to thank Profs. B.K. Clark, X. Fang, Z.L. Li, Y.J. Li and G.H. Rutherford, and G. Jacob, Z. Smozhanyk, and T. Walsh for many helpful discussions and suggestions. This work has been supported by the NSF. C.G. would like to thank ILP for the nice hospitality during his visit to Illinois State University and acknowledges the China Scholarship Council program for his PhD research. We also acknowledge access to the HPC cluster provided by Illinois State University.

Author information

Authors and Affiliations

Intense Laser Physics Theory Unit, Department of Physics, Illinois State University, Normal, IL, 61790-4560, USA
Chi Gong, Jordan Bryan, Alex Furcoiu, Qichang Su & Rainer Grobe
State Key Laboratory for GeoMechanics and Deep Underground Engineering, China University of Mining and Technology, Beijing, 100083, China
Chi Gong

Authors

Chi Gong
View author publications
You can also search for this author inPubMed Google Scholar
Jordan Bryan
View author publications
You can also search for this author inPubMed Google Scholar
Alex Furcoiu
View author publications
You can also search for this author inPubMed Google Scholar
Qichang Su
View author publications
You can also search for this author inPubMed Google Scholar
Rainer Grobe
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Qichang Su.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

(A) $P_{m}(t)$ for a model system of M classes

To get some first insight into the time scales of the temporal evolution of the proportions $P_{m}(t)$ of each class under the tournament selection, we present in this appendix an oversimplified system of M classes, where the corresponding initial fitness densities $\rho _{m}(f,t=0)$ are so narrow that they do not overlap with each other. This permits us to derive a universal iteration scheme, where the proportions $P_{m}(t)$ can be computed directly from the set of $P_{m}(t=0)$ without specifying the shape of $\rho _{m}(f,t=0)$.

In general, the new set of proportions $P_{m}(t+1)$ after the application of all $N_\mathrm{pop}$ tournaments can be obtained via the expression

$$\begin{aligned} P_{m}(t+1)&=\int \limits _{0}^{\infty }\text {d}fP_{m}(t)\rho _{m}(f,t)n_{T}\nonumber \\&\quad \left[ 1-\sum _{m'=1}^{M}P_{m'}(t)\int \limits _{0}^{f}\text {d}f'\rho _{m'}(f',t)\right] ^{n_{T}-1} \end{aligned}$$

(14)

The assumption of non-overlapping densities means that we can assign each class a unique mean fitness value, defined as $\int _{0}^{\infty }\text {d}f'\ f'\rho _{m}(f',t)\equiv f_{m}$. This permits us to order the class labels such that their associated mean fitness increases with increasing label m, i.e., $f_{m}<f_{m+1}$. If we further assume that each $\rho _{m}(f,t)$ is basically non-zero only in the interval $[f_{m}-\varDelta f/2, f_{m}+\varDelta f/2]$, then the integration range of first integral $\int _{0}^{\infty }\text {d}f$ of Eq. (14) can be approximated by $\int _{f_{m}-\varDelta f/2}^{f_{m}+\varDelta f/2}\text {d}f$. This means that the largest upper integration value f of the second integral $\int _{0}^{f}\text {d}f'$ is at most $f=f_{m}+\varDelta f/2$. As a result, some of the integrals in $S(f,t)=n_{T}[1-\sum _{m'=1}^{M}P_{m'}(t)\int _{0}^{f}\text {d}f'\rho _{m'} (f',t)]^{n_{T}-1}$ can be partially evaluated and therefore simplify significantly. The densities $\rho _{m'}(f',t)$ with a fitness lower than $f_{m}$ are integrated over their entire extent and we can use $\int _{f_{m}-\varDelta f/2}^{f_{m}+\varDelta f/2}\text {d}f\rho _{m'}(f',t)=1$. As a result, we obtain $P_{m'}(t)\int _{0}^{f}\text {d}f'\rho _{m'}(f',t) =\sum _{m'=1}^{m-1}P_{m'}(t)+\int _{0}^{f}\text {d}f'\rho _{m'}(f,t)$. This permits us to represent the entire integrand in S(f, t) as a total derivative and we obtain

$$\begin{aligned} P_{m}(t+1)&=\int \limits _{f_{m}-\varDelta f/2}^{f_{m}-\varDelta f/2}\text {d}f\,P_{m}(t) \rho _{m}(f,t)n_{T}\nonumber \\&\quad \left[ 1-\sum _{m'=1}^{M}P_{m'}(t)-\int \limits _{0}^{f}\text {d}f'P_{m}(t) \rho _{m}(f',t)\right] ^{n_{T}-1} \nonumber \\&=\int \limits _{f_{m}-\varDelta f/2}^{f_{m}+\varDelta f/2}\text {d}f(-)\text {d}/\text {d}f\nonumber \\&\quad \left[ 1-\sum \limits _{m'=1}^{M}P_{m'}(t)-\int \limits _{0}^{f}\text {d}f'P_{m} (t)\rho _{m}(f',t)\right] ^{n_{T}}\nonumber \\&=(-)\left[ 1-\sum \limits _{m'=1}^{m-1}P_{m'}(t)-\int \limits _{0}^{f}\text {d}f'P_{m}(t) \rho _{m}(f',t)\right] ^{n_{T}}\nonumber \\&\qquad |_{f_{m} -\varDelta f/2}^{f_{m}+\varDelta f/2}\nonumber \\&=\left[ 1-\sum _{m'=1}^{m-1}P_{m'}(t)\right] ^{n_{T}} -\left[ 1-\sum _{m'=1}^{m}P_{m'}(t)\right] ^{n_{T}} \end{aligned}$$

(15)

If we expand this set of equations, we obtain the sequences of mutually coupled iterations

$$\begin{aligned} P_{1}(t+1)= & {} 1-[1-P_{1}(t)]^{n_{T}} \nonumber \\ P_{2}(t+1)= & {} [1-P_{1}(t)]^{n_{T}} -[1-P_{1}(t)-P_{2}(t)]^{n_{T}}\nonumber \\ P_{3}(t+1)= & {} [1-P_{1}(t)-P_{2}(t)]^{n_{T}} -[1-P_{1}(t)-P_{2}(t)-P_{3}(t)]^{n_{T}}\nonumber \\&\ldots \nonumber \\ P_{M}(t+1)= & {} [1-P_{1}(t)-P_{2}(t)-P_{3}(t) -\cdots -P_{M-1}(t)]^{n_{T}} \end{aligned}$$

(16)

This means that we have obtained an iteration scheme to calculate the class populations $P_{m}(t+1)$ of the next generation solely from the set of $P_{m}(t)$, which have a lesser (or equal fitness). One can easily convince oneself that the norm is preserved by this set of maps. i.e., $\sum _{m=1}^{M}P_{m}(t+1)=\sum _{m=1}^{M} P_{m}(t)=1$.

As an interesting side-note, we remark that despite the nonlinear feature of these iterative maps, for the class with the lowest fitness ${m}=1$, we have the simpler iterative scheme $P_{1}(t+1)=1-[1-P_{1}(t)]^{n_{T}}$, which converges consistently to $P_{1}(t\rightarrow \infty )\rightarrow 1$. If we introduce the complementary proportion $Q_{1}(t+1)\equiv 1-P_{1}(t+1)$, we have $1-P_{1}(t+1)=[1-P_{1}(t)]^{n_{T}}$ such that $Q_{1}(t+1)=Q_{1}(t)^{n_{T}}$. This has the solution $Q_{1}(t)=Q_{1}(0)^{tn_{T}}$ such that we have $P_{1}(t)=1-[1-P_{1}(0)]^{tn_{T}}$, so $P_{1}(t)$ grows monotonically on the time scale proportional to $n_{T}^{-1}$ and independent of the proportions $P_{m}$ of the other classes, as one might expect. While the decay is monotonic, its time scale depends not only on $n_{T}$, but also very sensitively on its initial value $P_{1}(0)$. If $P_{1}(0)\ll 1$, then for short times $P_{1}(t)$ grows linearly in time with a slope proportional to $n_{T}P_{1}(0)$, i.e., $P_{1}(t) = n_{T}P_{1}(0)t$.

On the opposite side, if m matches the total number of classes, i.e., m = M, then the iteration scheme for the class with the largest fitness $f_{M}$ simplifies to

$$\begin{aligned} P_{M}(t+1) =\left[ 1-\sum \limits _{m'=1}^{M-1}P_{m'}(t)\right] ^{n_{T}} =[1-\{1-P_{M}(t)\}]^{n_{T}}=P_{M}(t)^{n_{T}} \end{aligned}$$

(17)

This permits us to find the complete time evolution for $t=1,2,\ldots ,$ as $P_{M}(t)=P_{M}(0)^{tn_{T}}$ following a universal monotonic exponential decay with decay time proportional to $n_{T}^{-1}$.

The time evolution of all the other proportions $P_{M}(t)$ for $m\ne 1$ and $m\ne M$ can be non-momotonic. As an example, in Fig. 5 we show the evolution of $M=10$ classes and $P_{m}(t=0)=1/M$ for the first five generations with a tournament size $n_{T}=2$. We see that the low-fitness proportions, $P_{m}(t)$ (for ${m}=1$, ..., 5) increase first and then decay, except $P_{1}(t)$, which approaches monotonically 1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gong, C., Bryan, J., Furcoiu, A. et al. Evolutionary Symbolic Regression from a Probabilistic Perspective. SN COMPUT. SCI. 3, 209 (2022). https://doi.org/10.1007/s42979-022-01094-0

Download citation

Received: 05 November 2021
Accepted: 14 March 2022
Published: 04 April 2022
DOI: https://doi.org/10.1007/s42979-022-01094-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evolutionary Symbolic Regression from a Probabilistic Perspective

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On the Effectiveness of Genetic Operations in Symbolic Regression

Genetic Programming Symbolic Regression: What Is the Prior on the Prediction?

Evolving Simple Symbolic Regression Models by Multi-Objective Genetic Programming

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

(A) \(P_{m}(t)\) for a model system of M classes

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Evolutionary Symbolic Regression from a Probabilistic Perspective

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On the Effectiveness of Genetic Operations in Symbolic Regression

Genetic Programming Symbolic Regression: What Is the Prior on the Prediction?

Evolving Simple Symbolic Regression Models by Multi-Objective Genetic Programming

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

(A) \(P_{m}(t)\) for a model system of M classes

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now