Forecasting with computer-evolved model specifications: a genetic programming application

https://doi.org/10.1016/S0305-0548(02)00098-9Get rights and content

Abstract

This paper uses genetic programming (GP) to evolve model specifications of time series data. GP is a computerized random search optimization algorithm that assembles equations until it identifies the fittest one. The technique is applied here to artificially simulated data first then to real-world sunspot numbers. One-step-ahead forecasts produced by the fittest of computer-evolved models are evaluated and compared with alternatives. The results suggest that GP may produce reasonable forecasts if their user selects appropriate input variables and comprehends the process investigated. Further, the technique appears promising in forecasting noisy complex series perhaps better than other existing methods. It is suitable for decision makers who set high priority on obtaining accurate forecasts rather than on probing into and approximating the underlying data generating process.

Scope and purpose

This paper contains a brief introduction and an evaluation of the use of genetic programming (GP) in forecasting time series. GP is a computerized random search optimization technique based upon Darwin's theory of evolution. The algorithm is first applied to model and forecast artificially simulated linear and nonlinear time series. Results are used to evaluate the effectiveness of GP as a forecasting technique. It is then applied to model and forecast sunspot numbers—the most frequently analyzed and forecasted series. An autoregressive and a threshold nonlinear dynamical systems to capture the dynamics of the irregular sunspot numbers’ cycle were tested using GP. The latter delivered estimated equations yielding the lowest mean square error ever reported for the series. This paper demonstrates that GP's forecasting capabilities depend on the structure and complexity of the process to model. Skills and intuition of GP's user are its limitation.

Introduction

Accurate forecasting is an essential ingredient for many management decisions. Traditional forecasting techniques with theoretical foundation in statistics and econometrics rely on proper specifications of systems that we assume understood. Advancements in technology enabled non-statistical computerized techniques to invade traditional forecasting methods in decision-making. It is now more than two decades since artificial neural networks (ANN) captured our attention as a computational forecasting technique. From a forecaster's view, genetic programming (GP) may be considered another promising competing one. Koza [1] introduced a formal description of it. He was first to introduce GP as a computer program that ‘searches’ for symbolic regressions. From a statistical point of view, the program searches for model specifications that can replicate patterns of observed series. A user of such program first provides mathematical operators and values of variables as input files. Upon its execution, the program randomly assembles many representations, identifies the fittest one (e.g., the one with lowest sum of squared errors—SSE), computes its fitted values and residuals’ statistics, then prints the results in output files. There are already many applications of GP in forecasting. Recently Sathyanarayan et al. [2] applied it to time series modeling. Chen et al. [3], Kaboudan [4], and Iba and Sasaki [5] used it to predict financial markets. Lee et al. [6] used it to forecast electric power demand. Resulting final equations seem to produce reasonably accurate predictions that may compare favorably with forecasts from humanly conceived specifications.

With encouraging results difficult to overlook or ignore, it is important to investigate GP as a forecasting algorithm. This paper evaluates forecasts produced by computer-evolved models that replicate the dynamics of experimental data as well as real-world time series. A brief description of the use of GP in forecasting is presented first. Descriptions of simulated and real-world data follow. Measures to evaluate the fittest computer-evolved model specifications are introduced next. They are followed by actual evaluation of the evolved specifications. Forecasts produced by these specifications are then discussed. A general assessment of GP as a forecasting technique finally precedes concluding remarks with suggestions for future research.

Section snippets

Computer-evolved specifications

Only brief background of GP and how it is used to evolve time series models is provided here. Detailed explanation of GP can be found in Koza [1] and Banzhaf et al. [7]. Generally, a GP algorithm is designed to optimize an objective function in a way that parallels Darwin's theory of natural selection and survival of the fittest. To search for and identify a fittest equation specification, the algorithm first randomly assembles a population consisting of a user-defined initial number of

The data

Two types of time series are used in this study, experimental and real. The first consists of seven simulated solutions of known data generating processes. The second consists only of sunspot numbers with unknown data generating process. The simulated series represent processes with different levels of complexity to help evaluate GP's relative performance. They include one linear, two linear-stochastic with different signal-to-noise ratios, two nonlinear, and two nonlinear-stochastic also with

Evaluation measures

Three statistics are used to evaluate evolved models: mean square error (MSE), normalized mean square error (NMSE), and an α-statistic. MSE is a common measure but cannot be used to compare two variables with different units of measurement. NMSE is a reasonable alternative with estimates benchmarked to a series’ mean, Ȳ.NMSE=MSE/VarianceofY=T−1(YtŶt)2/(T−1)−1(YtȲ)2.The third metric (α-statistic) benchmarks GP estimates (or forecasts) to random walk estimates (or forecasts). The statistic

Evaluation of evolved specifications

Evaluation statistics of the nine evolved models are in Table 1. The evolved models are included in Appendix A since (as mentioned earlier) they are difficult to interpret and analyzing them may be beyond the scope of this paper. The first seven columns in the Table contain statistics that belong to the simulated series. The last two represent sunspot numbers. Information in Table 1 suggests that GP was successful in approximating dynamics of the seven experimental processes according to MSE,

The forecasts

This Section contains one-step-ahead forecasts (for each series) and their comparison with alternatives. Alternative forecasts of the linear and linear-stochastic processes are produced after applying the Akiake's criterion to determine appropriate lags. An AR(2) specification fits all three. The forecast obtained for the Henon map is compared with the best identified in prior studies. Stern [17] using an artificial neural network reported the lowest forecast error using ANN. ANN forecasts of

Evaluation of GP's performance

From a user's view, computer-evolved specifications are different from conventional statistical methods. First, with GP there is little control in dictating the final model specification. It is the result of random selection. One may influence evolved equations only by adding or deleting operators or terminals the program selects from. But the fittest final equation specification is always unknown before executing a GP program. Second, reproduction of a fittest equation may be possible for data

Concluding remarks

The purpose of this study was to evaluate GP as a forecasting technique. It was used to evolve models and forecast future values of seven simulated data sets with different structural characteristics and of sunspot numbers. The results show that for estimation of linear and linear-stochastic systems GP may not be of much value. It fails to deliver the true specification and current standard statistical methods such as linear regression as well as time series ARIMA models produce equally

Dr. M.A. Kaboudan, associate professor of Statistics, joined the University of Redlands in 2001. Between 1985 and 2001, he was associate professor MS&IS with Penn State. His Ph.D. is in econometrics from WVU. His research areas of interest include forecasting, genetic programming, complexity and nonlinearity, financial market analysis, queuing theory, and macro-econometric and energy modeling. His work has been published in Journal of Forecasting, Computational Economics, Fuzzy Sets and Systems

References (20)

  • D. Lee et al.

    Genetic programming model for long-term forecasting of electric power demand

    Electric Power Systems Research

    (1997)
  • M. Kaboudan

    Statistical properties of fitted residuals from genetically evolved models

    Journal of Economic Dynamics and Control

    (2001)
  • J. Koza

    Genetic programming: on the programming of computers by means of natural selection

    (1992)
  • Sathyanarayan S, Birru H, Chellapilla K. Evolving nonlinear time-series models using evolutionary programming....
  • S. Chen et al.

    Option pricing with genetic programming

  • M. Kaboudan

    A measure of time series predictability using genetic programming applied to stock returns

    Journal of Forecasting

    (1998)
  • Iba H, Sasaki T. Using genetic programming to predict financial data. Proceedings of the 1999 Congress of Evolutionary...
  • W. Banzhaf et al.

    Genetic programming: an introduction

    (1998)
  • Singleton A. GPQuick: Simple GP system implemented in C++,...
  • M. Henon

    A two-dimensional mapping with a strange attractor

    Communications in Mathematical Physics

    (1976)
There are more references available in the full text version of this article.

Cited by (19)

View all citing articles on Scopus

Dr. M.A. Kaboudan, associate professor of Statistics, joined the University of Redlands in 2001. Between 1985 and 2001, he was associate professor MS&IS with Penn State. His Ph.D. is in econometrics from WVU. His research areas of interest include forecasting, genetic programming, complexity and nonlinearity, financial market analysis, queuing theory, and macro-econometric and energy modeling. His work has been published in Journal of Forecasting, Computational Economics, Fuzzy Sets and Systems, Journal of Applied Statistics, Computers and Operations Research, Journal of Economic Dynamics and Control, among others.

View full text