Elsevier

Applied Soft Computing

Volume 9, Issue 1, January 2009, Pages 49-60
Applied Soft Computing

An autonomous GP-based system for regression and classification problems

https://doi.org/10.1016/j.asoc.2008.03.008Get rights and content

Abstract

The aim of this research is to develop an autonomous system for solving data analysis problems. The system, called Genetic Programming-Autonomous Solver (GP-AS) contains most of the features required by an autonomous software: it decides if it knows or not how to solve a particular problem, it can construct solutions for new problems, it can store the created solutions for later use, it can improve the existing solutions in the idle-time it can efficiently manage the computer resources for fast running speed and it can detect and handle failure cases. The generator of solutions for new problems is based on an adaptive variant of Genetic Programming. We have tested this part by solving some well-known problems in the field of symbolic regression and classification. Numerical experiments show that the GP-AS system is able to perform very well on the considered test problems being able to successfully compete with standard GP having manually set parameters.

Introduction

Autonomous systems [5] are of high interest due to their ability to perform various tasks without relying on human interference. A computer program is autonomous if the user does not have to change its parameters when a new problem has to be solved. For achieving this goal the following features must be implemented:

  • The ability to perform well under significant uncertainties in the system and environment for extended periods of time.

  • The ability to recognize if it knows how to solve a problem or not.

  • The ability to create new solutions for new problems.

  • The ability to learn from previous experience.

  • The ability to use the computer resources in a efficient manner.

  • The ability to identify failure conditions.

  • The ability to improve the existing solutions during the idle-time.

The purpose of this research is to build a system meeting the previously described criteria. We have limited our attention to symbolic regression and classification problems due to several reasons:

  • These problems are of great interest because they arise in many real-world applications.

  • The input and output has a well-defined structure, which is easy to handle: arrays of symbols.

Our system, called Genetic Programming-Autonomous Solver (GP-AS) consists of six main parts: a Decision Maker, a Trainer, a Solver Repository, a Repository Manager, an Idle-time Manager and a Failure Manager. When a problem is presented to the system, the Decision Maker will decide which Solver will try to solve that problem by sending a request to the Repository Manager which in turn will query its database. If no suitable Solver is found, the Decision Maker will activate the Trainer, which will try to train a Solver for that problem. This new Solver will be added to the Repository for later use. The training of new Solvers is performed by using examples which are requested from the user. When no request is sent to the system it will try to improve the existing solutions by calling Idle-time Manager. Another component is able to detect possible failure of the systems and to act accordingly.

The Trainer has been used for solving several interesting and difficult problems: even-parity and other 22 real-world problem taken from PROBEN1 [41].

It is difficult to compare the GP-AS system with some other problem solvers because the experimental conditions are different. Comparisons between GP and other techniques have been previously performed by other authors [10], [17]. Here we have performed a raw comparison for the numerical experiments required to evolve Solvers. The results obtained by GP-AS are generally worse then those obtained by the systems that use a fixed population size and a fixed maximal tree height. There are still few cases where GP-AS Trainer performs better than standard GP. However, the comparison is not fair because experimental conditions are different. The GP-AS system uses an adaptive mechanism for population size and chromosome size and this means that it does not know which are the optimal values of these parameters for a given problem. This is different from other systems where several experimental trials have been performed in order to find this information.

The paper is organized as follows: related work is reviewed in Section 2. The extension of Genetic Programming, used for training Solvers, is described in Section 3. The way in which multiple outputs can be easily handled by GP is minutely discussed in Section 3.2. Fitness assignment in the case of regression (classification) problems is described in Sections 3.3.1 Fitness assignment for regression problems, 3.3.2 Fitness assignment for classification problems. The proposed GP-AS system is presented in Section 4. The structure of the input required by the GP-AS system is thoroughly discussed in Section 4.1. Decision Maker is unveiled in Section 4.2. The Trainer and its underlying algorithm are presented in Section 4.3. The Idle-time and Failure Managers are described in Sections 4.6 The Idle-time Manager, 4.7 The Failure Manager. Several numerical experiments used for solving problems are performed in Section 5 where we also discuss their results. Conclusions and future research directions are outlined in Section 6.

Section snippets

Related work

Developing automated problem solvers is one of the central themes of mathematics and computer science.

The source of inspiration in most of these approaches was the nature and the human brain. In this section, we will make a brief review of existing work in the field of general problem solvers and adaptive techniques.

Extending genetic programming

The most important part of the GP-AS system is the Trainer, which is used for generating new Solvers based on some examples. Genetic programming [27], [28] is used as underlying mechanism for the Trainer. The GP technique is described in this section. We have enriched the GP individuals with the ability to output multiple values.

Detailed description of GP-autonomous solver

Our purpose is to build an autonomous system which does not rely on the human interference when solving problems. For achieving this we have to provide to the system as much information as we can. Section 4.1 describes what information the system needs and which is the correct format for sending it.

The system is organized as follows:

  • (i)

    a Decision Maker (described in Section 4.2),

  • (ii)

    a Trainer (described in Section 4.3),

  • (iii)

    a Solver Repository (described in Section 4.4),

  • (iv)

    an Idle-time Manager (described in

Numerical experiments

We perform some numerical experiments for creating solvers. The tested problems are:

  • Boolean function finding,

  • symbolic regression,

  • classification.

Regression and classification problems actually refer to several real-world problems taken from PROBEN1 [41](which have been adapted from UCI Machine Learning Repository [54]). Linear GP has been previously used [10] in order to solve problems from this set. For all the problems, we deal with one output only. Therefore, the root of the GP tree provides

Conclusions and further work

A system called GP-AS has been investigated in this paper. GP-AS has six main components: a Decision Maker, a Trainer, a Repository of Problem Solvers, a Repository Manager, an Idle-Time Manager and a Failure Manager. The decision on whether the system knows how to solve a problem or not is taken by the Decision Maker. The Trainer – which is the most important part of the system – is based on an adaptive variant of Genetic Programming. Each problem has its own Solver (a GP program) which is

Acknowledgments

The authors thank to anonymous reviewers for their useful suggestions. This research was supported by grant IDEI-543 from CNCSIS.

References (51)

  • J. Reed et al.

    Simulation of biological evolution and machine learning I. selection of self-reproducing numeric patterns by data processing machines, effects of hereditary control, mutation type and crossing

    J. Theor. Biol.

    (1967)
  • P.J. Angeline et al.

    The evolutionary induction of subroutines

  • P.J. Angeline et al.

    Coevolving high-level representations

  • P.J. Angeline

    Adaptive and self-adaptive evolutionary computations

  • P.J. Angeline

    Two self-adaptive crossover operators for genetic programming

    Advances in Genetic Programming 2

    (1996)
  • P.J. Antsaklis et al.

    Towards intelligent autonomous control systems: architecture and fundamental issues

    J. Intell. Robot. Syst.

    (1989)
  • T. Back

    Self-adaptation in genetic algorithms

  • T. Back

    The interaction of mutation rate, selection, and self-adaptation within a genetic algorithm

  • J.D. Bagley, The behavior of adaptive systems which employ genetic and correlation algorithms, PhD Thesis, University...
  • W. Banzhaf et al.

    Genetic Programming—An Introduction; On the Automatic Evolution of Computer Programs and its Applications

    (2001)
  • M. Brameier et al.

    A comparison of linear genetic programming and neural networks in medical data mining

    IEEE Trans. Evol. Comput.

    (2001)
  • L. Breiman, Bias, variance, and arcing classifiers, Technical Report 460, Statistics Department, Berkeley,...
  • M.J. Colaco et al.

    Control of unsteady solidification via optimized magnetic fields

    Mater. Manuf. Process.

    (2005)
  • M.J. Colaco, G.S. Dulikravich, Solidification of double-diffusive flows using thermo magneto-hydrodynamics and...
  • F. Corno et al.

    Exploiting auto-adaptive micro gp for highly effective test programs generation

  • L. Davis

    Adapting probabilities in genetic algorithms

  • C.E. Rasmussen, R.M. Neal, G. Hinton, D. van Camp, M. Revow, Z. Ghahramani, K. Kustra, R. Tibshirani, Data for...
  • D. Edelman

    A comparative study of neural network, genetic programming, and support-vector machine methods in forecasting financial time series

  • J. Eggermont et al.

    Adaptive genetic programming applied to new and existing simple regression problems, genetic programming

  • A.E. Eiben et al.

    Parameter control in evolutionary algorithms

    IEEE Trans. Evol. Comput.

    (1999)
  • A.E. Eiben et al.

    SAW-ing EAs: adapting the fitness function for solving constrained problems

  • L.J. Fogel et al.

    A preliminary investigation on extending evolutionary programming to include self-adaptation on finite state machines

    Informatica

    (1994)
  • J.H. Friedman

    Multivariate adaptive regression splines

    Ann. Stat.

    (1991)
  • D.E. Goldberg

    Genetic Algorithms in Search, Optimization and Machine Learning

    (1989)
  • C. Grosan et al.

    Adaptive representation for single objective optimization

    Soft Comput.

    (2005)
  • Cited by (20)

    • Building interpretable predictive models with context-aware evolutionary learning

      2023, Applied Soft Computing
      Citation Excerpt :

      This section provides a brief review of evolutionary approaches to interpretability and context-aware ML (CML). Genetic programming (GP) has been extensively applied to symbolic regression [20]. Different from conventional machine learning (ML) methods, GP does not have make strict assumptions about its programs/models but allows the search algorithms to evolve both structures and parameters with its flexible representations (e.g. tree-based, graph-based).

    • Dynamic travel time prediction using data clustering and genetic programming

      2014, Transportation Research Part C: Emerging Technologies
      Citation Excerpt :

      The second advantage is that the GP solution is interpretable, which means it defines a logical relationship between the explanatory variables and the response variable. GP has been used successfully for regression and clustering (Oltean and Dioşan, 2009; Bezdek et al., 1994). GP is used in different applications, including curve fitting, data modeling, image and signal processing, financial trading, time series prediction, and economic modeling (Langdon et al., 2008).

    • Two-stage learning for multi-class classification using genetic programming

      2013, Neurocomputing
      Citation Excerpt :

      GP has been successfully used for evolution of classifier-programs like decision trees [2]. Other GP based classification approaches include evolution of neural networks [3–5], autonomous classification systems [6], rule induction algorithms [7], fuzzy rule based systems and fuzzy petri nets [5,8]. Most of these methods involve defining a grammar that is used to create and evolve classification algorithms using GP.

    • Two layered Genetic Programming for mixed-attribute data classification

      2012, Applied Soft Computing Journal
      Citation Excerpt :

      Several advancements are being made [8] to date. Other classifier evolution approaches include evolution of neural networks [9–11], autonomous systems [12], rule induction algorithms [13], fuzzy rule based systems and fuzzy petri nets [11,14]. Most of these methods involve defining a grammar that is used to create and evolve classification algorithms using GP.

    • Optimizing Diabetes Predictive Modeling with Automated Decision Trees

      2023, Proceedings - 2023 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Autonomous and Trusted Vehicles, Scalable Computing and Communications, Digital Twin, Privacy Computing and Data Security, Metaverse, SmartWorld/UIC/ATC/ScalCom/DigitalTwin/PCDS/Metaverse 2023
    View all citing articles on Scopus
    View full text