skip to main content
10.1145/3583131.3590346acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Reducing Overparameterization of Symbolic Regression Models with Equality Saturation

Published:12 July 2023Publication History

ABSTRACT

Overparameterized models in regression analysis are often harder to interpret and can be harder to fit because of ill-conditioning. Genetic programming is prone to overparameterized models as it evolves the structure of the model without taking the location of parameters into account. One way to alleviate this is rewriting the expression and merging the redundant fitting parameters. In this paper we propose the use of equality saturation to alleviate overparameterization. We first notice that all the tested GP implementations suffer from overparameterization to different extents and then show that equality saturation together with a small set of rewriting rules is capable of reducing the number of fitting parameters to a minimum with a high probability. Compared to one of the few available alternatives, Sympy, it produces much better and consistent results. These results lead to different possible future investigations such as the simplification of expressions during the evolutionary process, and improvement of the interpretability of symbolic models.

References

  1. Ignacio Arnaldo, Krzysztof Krawiec, and Una-May O'Reilly. 2014. Multiple regression genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. 879--886.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. F. Bomarito, P. E. Leser, N. C. M. Strauss, K. M. Garbrecht, and J. D. Hochhalter. 2022. Bayesian Model Selection for Reducing Bloat and Overfitting in Genetic Programming for Symbolic Regression. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (Boston, Massachusetts) (GECCO '22). Association for Computing Machinery, New York, NY, USA, 526--529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bogdan Burlacu, Gabriel Kronberger, and Michael Kommenda. 2020. Operon C++: An Efficient Genetic Programming Framework for Symbolic Regression. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion (GECCO '20). Association for Computing Machinery, internet, 1562--1570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J.S. Cohen. 2018. Computer Algebra and Symbolic Computation: Mathematical Methods. CRC Press. https://books.google.at/books?id=0WO2zQEACAAJGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  5. Miles Cranmer. 2020. PySR: Fast & Parallelized Symbolic Regression in Python/Julia. Google ScholarGoogle ScholarCross RefCross Ref
  6. Fabrício Olivetti de França and Guilherme Seidyo Imai Aldeia. 2021. Interaction-Transformation Evolutionary Algorithm for Symbolic Regression. Evolutionary computation 29, 3 (2021), 367--390.Google ScholarGoogle Scholar
  7. Roger Fletcher. 2013. Practical methods of optimization. John Wiley & Sons.Google ScholarGoogle ScholarCross RefCross Ref
  8. Andrew Gelman, Jennifer Hill, and Aki Vehtari. 2020. Regression and other stories. Cambridge University Press.Google ScholarGoogle Scholar
  9. Frank E Harrell. 2017. Regression modeling strategies. Bios 330, 2018 (2017), 14.Google ScholarGoogle Scholar
  10. Rajeev Joshi, Greg Nelson, and Keith Randall. 2002. Denali: A goal-directed superoptimizer. ACM SIGPLAN Notices 37, 5 (2002), 304--314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Robert E Kass. 1990. Nonlinear regression analysis and its applications. J. Amer. Statist. Assoc. 85, 410 (1990), 594--596.Google ScholarGoogle ScholarCross RefCross Ref
  12. Michael Kommenda, Bogdan Burlacu, Gabriel Kronberger, and Michael Affenzeller. 2020. Parameter identification for symbolic regression using nonlinear least squares. Genet. Program. Evolvable Mach 21, 3 (2020), 471--501.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. John R. Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA. http://mitpress.mit.edu/books/genetic-programmingGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gabriel Kronberger. 2022. Local Optimization Often is Ill-conditioned in Genetic Programming for Symbolic Regression. arXiv preprint arXiv:2209.00942 (2022).Google ScholarGoogle Scholar
  15. William La Cava and Jason H Moore. 2019. Semantic variation operators for multidimensional genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference. 1056--1064.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. William La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabricio Olivetti de Franca, Marco Virgolin, Ying Jin, Michael Kommenda, and Jason H. Moore. 2021. Contemporary Symbolic Regression Methods and their Relative Performance. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. https://openreview.net/pdf?id=xVQMrDLyGstGoogle ScholarGoogle Scholar
  17. William La Cava, Tilak Raj Singh, James Taggart, Srinivas Suri, and Jason H Moore. 2018. Learning concise representations for regression by evolving networks of trees. arXiv preprint arXiv:1807.00981 (2018).Google ScholarGoogle Scholar
  18. Aaron Meurer, Christopher P Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B Kirpichev, Matthew Rocklin, AMiT Kumar, Sergiu Ivanov, Jason K Moore, Sartaj Singh, et al. 2017. SymPy: symbolic computing in Python. PeerJ Computer Science 3 (2017), e103.Google ScholarGoogle ScholarCross RefCross Ref
  19. Pablo Moscato. 1999. Memetic Algorithms: A Short Introduction. In New Ideas in Optimization, David Corne, Marco Dorigo, and Fred Glover (Eds.). McGraw-Hill, London, 219--234.Google ScholarGoogle Scholar
  20. Chandrakana Nandi, Max Willsey, Adam Anderson, James R Wilcox, Eva Darulova, Dan Grossman, and Zachary Tatlock. 2020. Synthesizing structured CAD models with equality saturation and inverse transformations. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 31--44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Michael O'Neill, Leonardo Vanneschi, Steven Gustafson, and Wolfgang Banzhaf. 2010. Open issues in genetic programming. Genetic Programming and Evolvable Machines 11, 3 (01 Sep 2010), 339--363. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ludo Pagie and Paulien Hogeweg. 1997. Evolutionary Consequences of Coevolving Targets. Evolutionary Computation 5, 4 (Winter 1997), 401--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. David L Randall, Tyler S Townsend, Jacob D Hochhalter, and Geoffrey F Bomarito. 2022. Bingo: a customizable framework for symbolic regression with genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 2282--2288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Guido Smits and Mark Kotanchek. 2004. Pareto-Front Exploitation in Symbolic Regression. In Genetic Programming Theory and Practice II, Una-May O'Reilly, Tina Yu, Rick L. Riolo, and Bill Worzel (Eds.). Springer, Ann Arbor, Chapter 17, 283--299. Google ScholarGoogle ScholarCross RefCross Ref
  25. Silviu-Marian Udrescu and Max Tegmark. 2020. AI Feynman: A physics-inspired method for symbolic regression. Science Advances 6, 16 (2020), eaay2631.Google ScholarGoogle Scholar
  26. Marco Virgolin, Tanja Alderliesten, Cees Witteveen, and Peter AN Bosman. 2017. Scalable genetic programming by gene-pool optimal mixing and input-space entropy-based building-block learning. In Proceedings of the Genetic and Evolutionary Computation Conference. 1041--1048.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. egg: Fast and extensible equality saturation. Proceedings of the ACM on Programming Languages 5, POPL (2021), 1--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yihong Zhang, Yisu Remy Wang, Max Willsey, and Zachary Tatlock. 2022. Relational e-matching. Proceedings of the ACM on Programming Languages 6, POPL (2022), 1--22.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reducing Overparameterization of Symbolic Regression Models with Equality Saturation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference
      July 2023
      1667 pages
      ISBN:9798400701191
      DOI:10.1145/3583131

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 July 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia
    • Article Metrics

      • Downloads (Last 12 months)112
      • Downloads (Last 6 weeks)15

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader