skip to main content
10.1145/3583133.3590525acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
poster

Fighting Underspecification in Symbolic Regression with Fitness Sharing

Published:24 July 2023Publication History

ABSTRACT

Underspecification happens when there are different plausible hypotheses for a training and validation data set that behave differently when evaluating outside the training domain or distribution. Symbolic regression algorithms are prone to underspecification because of the additional degree of freedom of having to specify the structural component of the regression model. When facing different likely alternatives, some algorithms use the Occam's razor principle of choosing the simplest alternative. But not only there is no guarantee that this is the correct decision but the definition of simplest in symbolic regression is also subjective. In this work we analyse the use diversity control mechanisms to help fight the underspecification problem by providing to the end-user multiple alternative models in a single execution. These alternative models can be used in a post-analysis process when the practitioner has additional knowledge. For this purpose, we implemented a fitness sharing mechanism in the Transformation-Interaction-Rational Symbolic Regression algorithm with a distance function that measures how different two models behave outside the domain of the training data. The results showed that this adaptation is capable of producing multiple alternatives with similar fitness but with distinct behavior outside this domain.

References

  1. Michael Affenzeller, Stephan M Winkler, Bogdan Burlacu, Gabriel Kronberger, Michael Kommenda, and Stefan Wagner. 2017. Dynamic observation of genotypic and phenotypic diversity for different symbolic regression gp variants. In Proceedings of the genetic and evolutionary computation conference companion. 1553--1558.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Qi Chen, Bing Xue, and Mengjie Zhang. 2020. Preserving population diversity based on transformed semantics in genetic programming for symbolic regression. IEEE Transactions on Evolutionary Computation 25, 3 (2020), 433--447.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Fabrício O de França, Guilherme P Coelho, and Fernando J Von Zuben. 2010. On the diversity mechanisms of opt-aiNet: A comparative study with fitness sharing. In IEEE Congress on Evolutionary Computation. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  4. Fabrício Olivetti de França. 2022. Transformation-Interaction-Rational Representation for Symbolic Regression. In Proceedings of the Genetic and Evolutionary Computation Conference (Boston, Massachusetts) (GECCO '22). Association for Computing Machinery, New York, NY, USA, 920--928. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kalyanmoy Deb and David E Goldberg. 1989. An investigation of niche and species formation in genetic function optimization. In Proceedings of the third international conference on Genetic algorithms. 42--50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D Hoffman, et al. 2020. Underspecification presents challenges for credibility in modern machine learning. Journal of Machine Learning Research (2020).Google ScholarGoogle Scholar
  7. Anikó Ekárt and Sándor Zoltán Németh. 2000. A metric for genetic programs and fitness sharing. In European Conference on Genetic Programming. Springer, 259--270.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jonathan Kelly, Erik Hemberg, and Una-May O'Reilly. 2019. Improving genetic programming with novel exploration-exploitation control. In European Conference on Genetic Programming. Springer, 64--80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gabriel Kronberger, Fabricio Olivetti de França, Bogdan Burlacu, Christian Haider, and Michael Kommenda. 2022. Shape-Constrained Symbolic Regression---Improving Extrapolation with Prior Knowledge. Evolutionary Computation 30, 1 (2022), 75--98.Google ScholarGoogle ScholarCross RefCross Ref
  10. Samir W Mahfoud. 1995. Niching methods for genetic algorithms. University of Illinois at Urbana-Champaign.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Quang Uy Nguyen, Xuan Hoai Nguyen, Michael O'Neill, and Alexandros Agapitos. 2012. An investigation of fitness sharing with semantic and syntactic distance metrics. In European Conference on Genetic Programming. Springer, 109--120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ricardo Nieto-Fuentes and Carlos Segura. 2022. A replacement scheme based on dynamic penalization for controlling the diversity of the population in Genetic Programming. In 2022 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. U-M O'Reilly. 1997. Using a distance metric on genetic programs to understand genetic operators. In 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, Vol. 5. IEEE, 4092--4097.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ludo Pagie and Paulien Hogeweg. 1997. Evolutionary Consequences of Coevolving Targets. Evolutionary Computation 5, 4 (Winter 1997), 401--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Guido Smits and Mark Kotanchek. 2004. Pareto-Front Exploitation in Symbolic Regression. In Genetic Programming Theory and Practice II, Una-May O'Reilly, Tina Yu, Rick L. Riolo, and Bill Worzel (Eds.). Springer, Ann Arbor, Chapter 17, 283--299. Google ScholarGoogle ScholarCross RefCross Ref
  16. Damien Teney, Maxime Peyrard, and Ehsan Abbasnejad. 2022. Predicting is not understanding: Recognizing and addressing underspecification in machine learning. In European Conference on Computer Vision. Springer, 458--476.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Stephan M Winkler, Michael Affenzeller, Bogdan Burlacu, Gabriel Kronberger, Michael Kommenda, and Philipp Fleck. 2018. Similarity-based analysis of population dynamics in genetic programming performing symbolic regression. In Genetic Programming Theory and Practice XIV. Springer, 1--17.Google ScholarGoogle Scholar

Index Terms

  1. Fighting Underspecification in Symbolic Regression with Fitness Sharing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation
      July 2023
      2519 pages
      ISBN:9798400701207
      DOI:10.1145/3583133

      Copyright © 2023 Owner/Author(s)

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 July 2023

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia
    • Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)4

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader