An autonomous GP-based system for regression and classification problems

doi:10.1016/j.asoc.2008.03.008

Applied Soft Computing

Volume 9, Issue 1, January 2009, Pages 49-60

https://doi.org/10.1016/j.asoc.2008.03.008 Get rights and content

Abstract

The aim of this research is to develop an autonomous system for solving data analysis problems. The system, called Genetic Programming-Autonomous Solver (GP-AS) contains most of the features required by an autonomous software: it decides if it knows or not how to solve a particular problem, it can construct solutions for new problems, it can store the created solutions for later use, it can improve the existing solutions in the idle-time it can efficiently manage the computer resources for fast running speed and it can detect and handle failure cases. The generator of solutions for new problems is based on an adaptive variant of Genetic Programming. We have tested this part by solving some well-known problems in the field of symbolic regression and classification. Numerical experiments show that the GP-AS system is able to perform very well on the considered test problems being able to successfully compete with standard GP having manually set parameters.

Introduction

Autonomous systems [5] are of high interest due to their ability to perform various tasks without relying on human interference. A computer program is autonomous if the user does not have to change its parameters when a new problem has to be solved. For achieving this goal the following features must be implemented:

•
The ability to perform well under significant uncertainties in the system and environment for extended periods of time.
•
The ability to recognize if it knows how to solve a problem or not.
•
The ability to create new solutions for new problems.
•
The ability to learn from previous experience.
•
The ability to use the computer resources in a efficient manner.
•
The ability to identify failure conditions.
•
The ability to improve the existing solutions during the idle-time.

The purpose of this research is to build a system meeting the previously described criteria. We have limited our attention to symbolic regression and classification problems due to several reasons:

•
These problems are of great interest because they arise in many real-world applications.
•
The input and output has a well-defined structure, which is easy to handle: arrays of symbols.

Our system, called Genetic Programming-Autonomous Solver (GP-AS) consists of six main parts: a Decision Maker, a Trainer, a Solver Repository, a Repository Manager, an Idle-time Manager and a Failure Manager. When a problem is presented to the system, the Decision Maker will decide which Solver will try to solve that problem by sending a request to the Repository Manager which in turn will query its database. If no suitable Solver is found, the Decision Maker will activate the Trainer, which will try to train a Solver for that problem. This new Solver will be added to the Repository for later use. The training of new Solvers is performed by using examples which are requested from the user. When no request is sent to the system it will try to improve the existing solutions by calling Idle-time Manager. Another component is able to detect possible failure of the systems and to act accordingly.

The Trainer has been used for solving several interesting and difficult problems: even-parity and other 22 real-world problem taken from PROBEN1 [41].

It is difficult to compare the GP-AS system with some other problem solvers because the experimental conditions are different. Comparisons between GP and other techniques have been previously performed by other authors [10], [17]. Here we have performed a raw comparison for the numerical experiments required to evolve Solvers. The results obtained by GP-AS are generally worse then those obtained by the systems that use a fixed population size and a fixed maximal tree height. There are still few cases where GP-AS Trainer performs better than standard GP. However, the comparison is not fair because experimental conditions are different. The GP-AS system uses an adaptive mechanism for population size and chromosome size and this means that it does not know which are the optimal values of these parameters for a given problem. This is different from other systems where several experimental trials have been performed in order to find this information.

The paper is organized as follows: related work is reviewed in Section 2. The extension of Genetic Programming, used for training Solvers, is described in Section 3. The way in which multiple outputs can be easily handled by GP is minutely discussed in Section 3.2. Fitness assignment in the case of regression (classification) problems is described in Sections 3.3.1 Fitness assignment for regression problems, 3.3.2 Fitness assignment for classification problems. The proposed GP-AS system is presented in Section 4. The structure of the input required by the GP-AS system is thoroughly discussed in Section 4.1. Decision Maker is unveiled in Section 4.2. The Trainer and its underlying algorithm are presented in Section 4.3. The Idle-time and Failure Managers are described in Sections 4.6 The Idle-time Manager, 4.7 The Failure Manager. Several numerical experiments used for solving problems are performed in Section 5 where we also discuss their results. Conclusions and future research directions are outlined in Section 6.

Section snippets

Related work

Developing automated problem solvers is one of the central themes of mathematics and computer science.

The source of inspiration in most of these approaches was the nature and the human brain. In this section, we will make a brief review of existing work in the field of general problem solvers and adaptive techniques.

Extending genetic programming

The most important part of the GP-AS system is the Trainer, which is used for generating new Solvers based on some examples. Genetic programming [27], [28] is used as underlying mechanism for the Trainer. The GP technique is described in this section. We have enriched the GP individuals with the ability to output multiple values.

Detailed description of GP-autonomous solver

Our purpose is to build an autonomous system which does not rely on the human interference when solving problems. For achieving this we have to provide to the system as much information as we can. Section 4.1 describes what information the system needs and which is the correct format for sending it.

The system is organized as follows:

(i)
a Decision Maker (described in Section 4.2),
(ii)
a Trainer (described in Section 4.3),
(iii)
a Solver Repository (described in Section 4.4),
(iv)
an Idle-time Manager (described in

Numerical experiments

We perform some numerical experiments for creating solvers. The tested problems are:

•
Boolean function finding,
•
symbolic regression,
•
classification.

Regression and classification problems actually refer to several real-world problems taken from PROBEN1 [41](which have been adapted from UCI Machine Learning Repository [54]). Linear GP has been previously used [10] in order to solve problems from this set. For all the problems, we deal with one output only. Therefore, the root of the GP tree provides

Conclusions and further work

A system called GP-AS has been investigated in this paper. GP-AS has six main components: a Decision Maker, a Trainer, a Repository of Problem Solvers, a Repository Manager, an Idle-Time Manager and a Failure Manager. The decision on whether the system knows how to solve a problem or not is taken by the Decision Maker. The Trainer – which is the most important part of the system – is based on an adaptive variant of Genetic Programming. Each problem has its own Solver (a GP program) which is

Acknowledgments

The authors thank to anonymous reviewers for their useful suggestions. This research was supported by grant IDEI-543 from CNCSIS.

References (51)

J. Reed et al.
Simulation of biological evolution and machine learning I. selection of self-reproducing numeric patterns by data processing machines, effects of hereditary control, mutation type and crossing
J. Theor. Biol.
(1967)
P.J. Angeline et al.
The evolutionary induction of subroutines
P.J. Angeline et al.
Coevolving high-level representations
P.J. Angeline
Adaptive and self-adaptive evolutionary computations
P.J. Angeline
Two self-adaptive crossover operators for genetic programming
Advances in Genetic Programming 2
(1996)
P.J. Antsaklis et al.
Towards intelligent autonomous control systems: architecture and fundamental issues
J. Intell. Robot. Syst.
(1989)
T. Back
Self-adaptation in genetic algorithms
T. Back
The interaction of mutation rate, selection, and self-adaptation within a genetic algorithm
J.D. Bagley, The behavior of adaptive systems which employ genetic and correlation algorithms, PhD Thesis, University...
W. Banzhaf et al.
Genetic Programming—An Introduction; On the Automatic Evolution of Computer Programs and its Applications
(2001)

M. Brameier et al.

A comparison of linear genetic programming and neural networks in medical data mining

IEEE Trans. Evol. Comput.

(2001)

L. Breiman, Bias, variance, and arcing classifiers, Technical Report 460, Statistics Department, Berkeley,...

M.J. Colaco et al.

Control of unsteady solidification via optimized magnetic fields

Mater. Manuf. Process.

(2005)

M.J. Colaco, G.S. Dulikravich, Solidification of double-diffusive flows using thermo magneto-hydrodynamics and...

F. Corno et al.

Exploiting auto-adaptive micro gp for highly effective test programs generation

L. Davis

Adapting probabilities in genetic algorithms

C.E. Rasmussen, R.M. Neal, G. Hinton, D. van Camp, M. Revow, Z. Ghahramani, K. Kustra, R. Tibshirani, Data for...

D. Edelman

A comparative study of neural network, genetic programming, and support-vector machine methods in forecasting financial time series

J. Eggermont et al.

Adaptive genetic programming applied to new and existing simple regression problems, genetic programming

A.E. Eiben et al.

Parameter control in evolutionary algorithms

IEEE Trans. Evol. Comput.

(1999)

A.E. Eiben et al.

SAW-ing EAs: adapting the fitness function for solving constrained problems

L.J. Fogel et al.

A preliminary investigation on extending evolutionary programming to include self-adaptation on finite state machines

Informatica

(1994)

J.H. Friedman

Multivariate adaptive regression splines

Ann. Stat.

(1991)

D.E. Goldberg

Genetic Algorithms in Search, Optimization and Machine Learning

(1989)

C. Grosan et al.

Adaptive representation for single objective optimization

Soft Comput.

(2005)

Cited by (20)

Building interpretable predictive models with context-aware evolutionary learning
2023, Applied Soft Computing
Citation Excerpt :
This section provides a brief review of evolutionary approaches to interpretability and context-aware ML (CML). Genetic programming (GP) has been extensively applied to symbolic regression [20]. Different from conventional machine learning (ML) methods, GP does not have make strict assumptions about its programs/models but allows the search algorithms to evolve both structures and parameters with its flexible representations (e.g. tree-based, graph-based).
Building prediction models with the right balance between performance and interpretability is currently a great challenge in machine learning. A large number of recent studies have focused on either building intrinsically interpretable models or developing general explainers for blackbox models. Although these methods have been widely adopted, their interpretability or explanations are not always useful because of the lack of contexts considered in training machine learning models and producing explanations. This paper aims to tackle this significant challenge by developing a context-aware evolutionary learning algorithm (CELA) for building interpretable prediction models. A new context extraction method based on unsupervised self-structuring learning algorithms is developed to treat data in contexts. The proposed algorithm overcomes the limitations of existing evolutionary learning methods in handling a large number of features and large datasets by training specialised interpretable models based on the automatically extracted contexts. The new algorithm has been tested on complex regression datasets and a real-world building energy prediction task. The results suggest CELA can outperform well-known interpretable machine learning (IML) algorithms, the state-of-the-art evolutionary algorithm, and can produce predictions much closer to the results of blackbox algorithms such as XGBoost and artificial neural networks than the compared IML methods. Further analyses also demonstrate that the CELA’s prediction models are smaller and easier to interpret than those obtained by the evolutionary learning algorithm without context awareness.
Dynamic travel time prediction using data clustering and genetic programming
2014, Transportation Research Part C: Emerging Technologies
Citation Excerpt :
The second advantage is that the GP solution is interpretable, which means it defines a logical relationship between the explanatory variables and the response variable. GP has been used successfully for regression and clustering (Oltean and Dioşan, 2009; Bezdek et al., 1994). GP is used in different applications, including curve fitting, data modeling, image and signal processing, financial trading, time series prediction, and economic modeling (Langdon et al., 2008).
The current state-of-practice for predicting travel times assumes that the speeds along the various roadway segments remain constant over the duration of the trip. This approach produces large prediction errors, especially when the segment speeds vary temporally. In this paper, we develop a data clustering and genetic programming approach for modeling and predicting the expected, lower, and upper bounds of dynamic travel times along freeways. The models obtained from the genetic programming approach are algebraic expressions that provide insights into the spatiotemporal interactions. The use of an algebraic equation also means that the approach is computationally efficient and suitable for real-time applications. Our algorithm is tested on a 37-mile freeway section encompassing several bottlenecks. The prediction error is demonstrated to be significantly lower than that produced by the instantaneous algorithm and the historical average averaged over seven weekdays (p-value <0.0001). Specifically, the proposed algorithm achieves more than a 25% and 76% reduction in the prediction error over the instantaneous and historical average, respectively on congested days. When bagging is used in addition to the genetic programming, the results show that the mean width of the travel time interval is less than 5 min for the 60–80 min trip.
Two-stage learning for multi-class classification using genetic programming
2013, Neurocomputing
Citation Excerpt :
GP has been successfully used for evolution of classifier-programs like decision trees [2]. Other GP based classification approaches include evolution of neural networks [3–5], autonomous classification systems [6], rule induction algorithms [7], fuzzy rule based systems and fuzzy petri nets [5,8]. Most of these methods involve defining a grammar that is used to create and evolve classification algorithms using GP.
This paper introduces a two-stage strategy for multi-class classification problems. The proposed technique is an advancement of tradition binary decomposition method. In the first stage, the classifiers are trained for each class versus the remaining classes. A modified fitness value is used to select good discriminators for the imbalanced data. In the second stage, the classifiers are integrated and treated as a single chromosome that can classify any of the classes from the dataset. A population of such classifier-chromosomes is created from good classifiers (for individual classes) of the first phase. This population is evolved further, with a fitness that combines accuracy and conflicts. The proposed method encourages the classifier combination with good discrimination among all classes and less conflicts. The two-stage learning has been tested on several benchmark datasets and results are found encouraging.
Two layered Genetic Programming for mixed-attribute data classification
2012, Applied Soft Computing Journal
Citation Excerpt :
Several advancements are being made [8] to date. Other classifier evolution approaches include evolution of neural networks [9–11], autonomous systems [12], rule induction algorithms [13], fuzzy rule based systems and fuzzy petri nets [11,14]. Most of these methods involve defining a grammar that is used to create and evolve classification algorithms using GP.
The important problem of data classification spans numerous real life applications. The classification problem has been tackled by using Genetic Programming in many successful ways. Most approaches focus on classification of only one type of data. However, most of the real-world data contain a mixture of categorical and continuous attributes. In this paper, we present an approach to classify mixed attribute data using Two Layered Genetic Programming (L2GP). The presented approach does not transform data into any other type and combines the properties of arithmetic expressions (using numerical data) and logical expressions (using categorical data). The outer layer contains logical functions and some nodes. These nodes contain the inner layer and are either logical or arithmetic expressions. Logical expressions give their Boolean output to the outer tree. The arithmetic expressions give a real value as their output. Positive real value is considered true and a negative value is considered false. These outputs of inner layers are used to evaluate the outer layer which determines the classification decision. The proposed classification technique has been applied on various heterogeneous data classification problems and found successful.
Optimizing Diabetes Predictive Modeling with Automated Decision Trees
2023, Proceedings - 2023 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Autonomous and Trusted Vehicles, Scalable Computing and Communications, Digital Twin, Privacy Computing and Data Security, Metaverse, SmartWorld/UIC/ATC/ScalCom/DigitalTwin/PCDS/Metaverse 2023
Feature Bundles and their Effect on the Performance of Tree-based Evolutionary Classification and Feature Selection Algorithms
2019, 2019 IEEE Congress on Evolutionary Computation, CEC 2019 - Proceedings

View all citing articles on Scopus

View full text

An autonomous GP-based system for regression and classification problems

Abstract

Introduction

Section snippets

Related work

Extending genetic programming

Detailed description of GP-autonomous solver

Numerical experiments

Conclusions and further work

Acknowledgments

J. Theor. Biol.

The evolutionary induction of subroutines

Coevolving high-level representations

Adaptive and self-adaptive evolutionary computations

Two self-adaptive crossover operators for genetic programming

Advances in Genetic Programming 2

Towards intelligent autonomous control systems: architecture and fundamental issues

J. Intell. Robot. Syst.

Self-adaptation in genetic algorithms

The interaction of mutation rate, selection, and self-adaptation within a genetic algorithm

Genetic Programming—An Introduction; On the Automatic Evolution of Computer Programs and its Applications

A comparison of linear genetic programming and neural networks in medical data mining

IEEE Trans. Evol. Comput.

Control of unsteady solidification via optimized magnetic fields

Mater. Manuf. Process.

Exploiting auto-adaptive micro gp for highly effective test programs generation

Adapting probabilities in genetic algorithms

A comparative study of neural network, genetic programming, and support-vector machine methods in forecasting financial time series

Adaptive genetic programming applied to new and existing simple regression problems, genetic programming

Parameter control in evolutionary algorithms

IEEE Trans. Evol. Comput.

SAW-ing EAs: adapting the fitness function for solving constrained problems

A preliminary investigation on extending evolutionary programming to include self-adaptation on finite state machines

Informatica

Multivariate adaptive regression splines

Ann. Stat.

Genetic Algorithms in Search, Optimization and Machine Learning

Adaptive representation for single objective optimization

Soft Comput.