research-article

MLStar: A System for Synthesis of Machine-Learning Programs

Authors:

Gabriel Kopito,

Jonathan Schwartz,

Julien Amblard,

Landon RabernAuthors Info & Claims

GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation

Pages 1721 - 1726

https://doi.org/10.1145/3583133.3596367

Published: 24 July 2023 Publication History

Abstract

This paper introduces our auto-ML system, MLStar, which uses genetic programming to create scikit-learn and Keras-based Python programs to perform supervised learning. MLStar leverages our own genetic programming system (GPStar4) and provides a greater search space compared to traditional genetic programming frameworks.

Key elements that enable MLStar's performance include representing individuals as Directed Acyclic Graphs (DAGs), a rich type system to shape the kinds of graphs generated, novel genetic operators which work on the DAG structure, and advanced hyperparameter tuning via the Optuna hyperparameter optimization framework. MLStar also offers multiobjective fitnesses and a variety of complex population types.

We show that MLStar performs favorably to several other auto-ML frameworks on benchmark tests. We also demonstrate that MLStar is capable of competitive solutions even when running with computationally expensive features disabled.

References

[1]

2023. auto_ml. https://github.com/ClimbsRocks/auto_ml

[2]

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Digital Library

[3]

James Bergstra, Daniel Yamins, and David Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 28), Sanjoy Dasgupta and David McAllester (Eds.). PMLR, Atlanta, Georgia, USA, 115--123. https://proceedings.mlr.press/v28/bergstra13.html

[4]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). ACM, New York, NY, USA, 785--794.

Digital Library

[5]

Francois Chollet et al. 2015. Keras. https://github.com/fchollet/keras

[6]

Piali Das, Nikita Ivkin, Tanya Bansal, Laurence Rouesnel, Philip Gautier, Zohar Karnin, Leo Dirac, Lakshmi Ramakrishnan, Andre Perunicic, Iaroslav Shcherbatyi, Wilton Wu, Aida Zolic, Huibin Shen, Amr Ahmed, Fela Winkelmolen, Miroslav Miladinovic, Cedric Archembeau, Alex Tang, Bhaskar Dutt, Patricia Grao, and Kumar Venkateswar. 2020. Amazon SageMaker Autopilot: A White Box AutoML Solution at Scale. In Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning (Portland, OR, USA) (DEEM'20). Association for Computing Machinery, New York, NY, USA, Article 2, 7 pages.

Digital Library

[7]

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Advances in Neural Information Processing Systems 28 (2015). 2962--2970.

Digital Library

[8]

Haifeng Jin, Qingquan Song, and Xia Hu. 2019. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, New York, NY, USA, 1946--1956.

Digital Library

[9]

Gabriel Kopito Julien Amblard, Robert Filman. 2023. GPStar4: A flexible framework for experimenting with genetic programming. submitted to GECCO 2023.

Digital Library

[10]

James Max Kanter and Kalyan Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2015), 1--10.

[11]

Donald E. Knuth. 1968. Semantics of Context-Free Languages. Math. Syst. Theory 2, 2 (1968), 127--145.

[12]

Erin LeDell and Sebastien Poirier. 2020. H2O AutoML: Scalable Automatic Machine Learning. 7th ICML Workshop on Automated Machine Learning (AutoML) (July 2020). https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf

[13]

Edgar Galván López and Katya Rodríguez-Vázquez. 2007. Multiple Interactive Outputs in a Single Tree: An Empirical Investigation. In Genetic Programming, 10th European Conference, EuroGP 2007, Valencia, Spain, April 11--13, 2007, Proceedings (Lecture Notes in Computer Science, Vol. 4445), Marc Ebner, Michael O'Neill, Anikó Ekárt, Leonardo Vanneschi, and Anna Esparcia-Alcázar (Eds.). Springer, 341--350.

[14]

Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science.

[15]

Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 36 (11 Dec 2017), 1--13.

[16]

Michael O'Neill. 2009. Riccardo Poli, William B. Langdon, Nicholas F. McPhee: A Field Guide to Genetic Programming: Lulu. com, 2008, 250 pp, ISBN 978-1-4092-0073-4.

[17]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.

Digital Library

[18]

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf

[19]

Léo Françoso D. P. Sotto, Paul Kaufmann, Timothy Atkinson, Roman Kalkreuth, and Márcio Porto Basgalupp. 2020. A Study on Graph Representations for Genetic Programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (Cancún, Mexico) (GECCO '20). Association for Computing Machinery, New York, NY, USA, 931--939.

Digital Library

[20]

In-Kwon Yeo and Richard A. Johnson. 2000. A new family of power transformations to improve normality or symmetry. Biometrika 87, 4 (12 2000), 954--959. arXiv:https://academic.oup.com/biomet/article-pdf/87/4/954/633221/870954.pdf

Cited By

Amblard JFilman RKopito GSilva SPaquete L(2023)GPStar4: A flexible framework for experimenting with genetic programmingProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3596369(1910-1915)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583133.3596369

Index Terms

MLStar: A System for Synthesis of Machine-Learning Programs
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Bio-inspired approaches
        Genetic programming

Recommendations

On finding optimal polytrees

We study the NP-hard problem of finding a directed acyclic graph (DAG) on a given set of nodes so as to maximize a given scoring function. The problem models the task of inferring a probabilistic network from data, which has been studied extensively in ...
Stack and Queue Layouts of Directed Acyclic Graphs: Part I

Stack layouts and queue layouts of undirected graphs have been used to model problems in fault-tolerant computing and in parallel process scheduling. However, problems in parallel process scheduling are more accurately modeled by stack and queue layouts ...
Synchronous Dynamical Systems on Directed Acyclic Graphs: Complexity and Algorithms
Discrete dynamical systems serve as useful formal models to study diffusion phenomena in social networks. Several recent articles have studied the algorithmic and complexity aspects of some decision problems on synchronous Boolean networks, which are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation

July 2023

2519 pages

ISBN:9798400701207

DOI:10.1145/3583133

Chair:
Sara Silva,
Program Chair:
Luís Paquete

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GECCO '23 Companion

Sponsor:

SIGEVO

GECCO '23 Companion: Companion Conference on Genetic and Evolutionary Computation

July 15 - 19, 2023

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
41
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Amblard JFilman RKopito GSilva SPaquete L(2023)GPStar4: A flexible framework for experimenting with genetic programmingProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3596369(1910-1915)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583133.3596369

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten