Forecasting with computer-evolved model specifications: a genetic programming application
Introduction
Accurate forecasting is an essential ingredient for many management decisions. Traditional forecasting techniques with theoretical foundation in statistics and econometrics rely on proper specifications of systems that we assume understood. Advancements in technology enabled non-statistical computerized techniques to invade traditional forecasting methods in decision-making. It is now more than two decades since artificial neural networks (ANN) captured our attention as a computational forecasting technique. From a forecaster's view, genetic programming (GP) may be considered another promising competing one. Koza [1] introduced a formal description of it. He was first to introduce GP as a computer program that ‘searches’ for symbolic regressions. From a statistical point of view, the program searches for model specifications that can replicate patterns of observed series. A user of such program first provides mathematical operators and values of variables as input files. Upon its execution, the program randomly assembles many representations, identifies the fittest one (e.g., the one with lowest sum of squared errors—SSE), computes its fitted values and residuals’ statistics, then prints the results in output files. There are already many applications of GP in forecasting. Recently Sathyanarayan et al. [2] applied it to time series modeling. Chen et al. [3], Kaboudan [4], and Iba and Sasaki [5] used it to predict financial markets. Lee et al. [6] used it to forecast electric power demand. Resulting final equations seem to produce reasonably accurate predictions that may compare favorably with forecasts from humanly conceived specifications.
With encouraging results difficult to overlook or ignore, it is important to investigate GP as a forecasting algorithm. This paper evaluates forecasts produced by computer-evolved models that replicate the dynamics of experimental data as well as real-world time series. A brief description of the use of GP in forecasting is presented first. Descriptions of simulated and real-world data follow. Measures to evaluate the fittest computer-evolved model specifications are introduced next. They are followed by actual evaluation of the evolved specifications. Forecasts produced by these specifications are then discussed. A general assessment of GP as a forecasting technique finally precedes concluding remarks with suggestions for future research.
Section snippets
Computer-evolved specifications
Only brief background of GP and how it is used to evolve time series models is provided here. Detailed explanation of GP can be found in Koza [1] and Banzhaf et al. [7]. Generally, a GP algorithm is designed to optimize an objective function in a way that parallels Darwin's theory of natural selection and survival of the fittest. To search for and identify a fittest equation specification, the algorithm first randomly assembles a population consisting of a user-defined initial number of
The data
Two types of time series are used in this study, experimental and real. The first consists of seven simulated solutions of known data generating processes. The second consists only of sunspot numbers with unknown data generating process. The simulated series represent processes with different levels of complexity to help evaluate GP's relative performance. They include one linear, two linear-stochastic with different signal-to-noise ratios, two nonlinear, and two nonlinear-stochastic also with
Evaluation measures
Three statistics are used to evaluate evolved models: mean square error (MSE), normalized mean square error (NMSE), and an α-statistic. MSE is a common measure but cannot be used to compare two variables with different units of measurement. NMSE is a reasonable alternative with estimates benchmarked to a series’ mean, .The third metric (α-statistic) benchmarks GP estimates (or forecasts) to random walk estimates (or forecasts). The statistic
Evaluation of evolved specifications
Evaluation statistics of the nine evolved models are in Table 1. The evolved models are included in Appendix A since (as mentioned earlier) they are difficult to interpret and analyzing them may be beyond the scope of this paper. The first seven columns in the Table contain statistics that belong to the simulated series. The last two represent sunspot numbers. Information in Table 1 suggests that GP was successful in approximating dynamics of the seven experimental processes according to MSE,
The forecasts
This Section contains one-step-ahead forecasts (for each series) and their comparison with alternatives. Alternative forecasts of the linear and linear-stochastic processes are produced after applying the Akiake's criterion to determine appropriate lags. An AR(2) specification fits all three. The forecast obtained for the Henon map is compared with the best identified in prior studies. Stern [17] using an artificial neural network reported the lowest forecast error using ANN. ANN forecasts of
Evaluation of GP's performance
From a user's view, computer-evolved specifications are different from conventional statistical methods. First, with GP there is little control in dictating the final model specification. It is the result of random selection. One may influence evolved equations only by adding or deleting operators or terminals the program selects from. But the fittest final equation specification is always unknown before executing a GP program. Second, reproduction of a fittest equation may be possible for data
Concluding remarks
The purpose of this study was to evaluate GP as a forecasting technique. It was used to evolve models and forecast future values of seven simulated data sets with different structural characteristics and of sunspot numbers. The results show that for estimation of linear and linear-stochastic systems GP may not be of much value. It fails to deliver the true specification and current standard statistical methods such as linear regression as well as time series ARIMA models produce equally
Dr. M.A. Kaboudan, associate professor of Statistics, joined the University of Redlands in 2001. Between 1985 and 2001, he was associate professor MS&IS with Penn State. His Ph.D. is in econometrics from WVU. His research areas of interest include forecasting, genetic programming, complexity and nonlinearity, financial market analysis, queuing theory, and macro-econometric and energy modeling. His work has been published in Journal of Forecasting, Computational Economics, Fuzzy Sets and Systems
References (20)
- et al.
Genetic programming model for long-term forecasting of electric power demand
Electric Power Systems Research
(1997) Statistical properties of fitted residuals from genetically evolved models
Journal of Economic Dynamics and Control
(2001)Genetic programming: on the programming of computers by means of natural selection
(1992)- Sathyanarayan S, Birru H, Chellapilla K. Evolving nonlinear time-series models using evolutionary programming....
- et al.
Option pricing with genetic programming
A measure of time series predictability using genetic programming applied to stock returns
Journal of Forecasting
(1998)- Iba H, Sasaki T. Using genetic programming to predict financial data. Proceedings of the 1999 Congress of Evolutionary...
- et al.
Genetic programming: an introduction
(1998) - Singleton A. GPQuick: Simple GP system implemented in C++,...
A two-dimensional mapping with a strange attractor
Communications in Mathematical Physics
(1976)
Cited by (19)
Land use in the southern Yucatán peninsular region of Mexico: Scenarios of population and institutional change
2006, Computers, Environment and Urban SystemsA neural network solution for forecasting labor demand of drop-in peer tutoring centers with long planning horizons
2019, Education and Information TechnologiesA novel electric power plants performance assessment technique based on genetic programming approach
2014, Modern Applied ScienceA novel performance assessment procedure for electric power generations plants based on genetic programming approach
2013, Middle East Journal of Scientific ResearchA hybrid genetic programming method in optimization and forecasting: A case study of the broadband penetration in OECD countries
2012, Advances in Operations Research
Dr. M.A. Kaboudan, associate professor of Statistics, joined the University of Redlands in 2001. Between 1985 and 2001, he was associate professor MS&IS with Penn State. His Ph.D. is in econometrics from WVU. His research areas of interest include forecasting, genetic programming, complexity and nonlinearity, financial market analysis, queuing theory, and macro-econometric and energy modeling. His work has been published in Journal of Forecasting, Computational Economics, Fuzzy Sets and Systems, Journal of Applied Statistics, Computers and Operations Research, Journal of Economic Dynamics and Control, among others.