Elsevier

Information Fusion

Volume 9, Issue 1, January 2008, Pages 41-55
Information Fusion

A new boosting algorithm for improved time-series forecasting with recurrent neural networks

https://doi.org/10.1016/j.inffus.2006.10.009Get rights and content

Abstract

Ensemble methods for classification and regression have focused a great deal of attention in recent years. They have shown, both theoretically and empirically, that they are able to perform substantially better than single models in a wide range of tasks. We have adapted an ensemble method to the problem of predicting future values of time series using recurrent neural networks (RNNs) as base learners. The improvement is made by combining a large number of RNNs, each of which is generated by training on a different set of examples. This algorithm is based on the boosting algorithm where difficult points of the time series are concentrated on during the learning process however, unlike the original algorithm, we introduce a new parameter for tuning the boosting influence on available examples. We test our boosting algorithm for RNNs on single-step-ahead and multi-step-ahead prediction problems. The results are then compared to other regression methods, including those of different local approaches. The overall results obtained through our ensemble method are more accurate than those obtained through the standard method, backpropagation through time, on these datasets and perform significantly better even when long-range dependencies play an important role.

Introduction

The reliable prediction of future values of real-valued time series has many important applications ranging from ecological modeling to dynamic systems control, finance and marketing.

Modeling the system that generated the series is often the first step in the setting up of a forecasting system. It can provide an estimation of future values based on past values. It is possible, at least numerically, to solve the equations of a mathematical model compound of a deterministic equation set where the initial conditions are known so that the evolution of the system can be determined.

Generally the characteristics of the phenomenon which generate the series are unknown. The information available for the prediction is limited to the past values of the series. The relations which describe the evolution should be deduced from these values, in the form of functional relation approximations between the past and the future values.

The most frequently adopted approach to estimate single-step-ahead (SS) future values xˆ(t+1) consists in using a function f which uses the recent history of the time series xˆ(t+1)=f(x(t),x(t-1),x(t-2),) as input, where x(t), for 0  t  l, is the time series data that can be used for building a model. In multi-step-ahead (MS) prediction, given {x(t), x(t  1), x(t  2), …}, a reasonable estimate xˆ(t+h) of x(t + h) has to be looked for, h being the number of steps ahead.

To allow for the building of such models an appropriate function f is needed.

Given their universal approximation properties, multi-layer perceptrons (MLP [3]) are often successful in modeling nonlinear functions f. In this case, a fixed number p of past values is fed into the input layer of the MLP and the output is required so that a future value of the time series can be predicted.

Using a time window of a fixed size has proven to be limiting in many applications: if the time window is too narrow, important information may be left out while, if the window is too wide, useless inputs may cause distracting noise.

Ideally, for a given problem, the size of the time window should be adapted to the context. This can be done by using recurrent neural networks (RNNs) [3], [4] which would be learned by an algorithm based on a gradient such as backpropagation through time (BPTT) algorithm. It is possible to improve the results obtained by several means. One way is to develop a more appropriate training algorithm based on a priori information obtained from application field knowledge (see for example [5]).

It is also possible to adopt general methods which would improve the results of other different models.

One of these is known as ‘boosting’ which was introduced in [6].

The possible small improvement that a “weak” model can obtain compared to random estimate is substantially increased in the boosting algorithm through the sequential construction of several such models that concentrate progressively on the difficult examples of the original learning set. In this paper we will focus on the definition of a boosting algorithm used to improve the prediction performance of RNNs. A new parameter will be introduced, allowing regulation of the boosting effect.

A common problem with the time series forecasting model is the low accuracy of long term forecasts. The estimated value of a variable may be reasonably reliable in the short term, but for longer term forecasts, the estimate is likely to become less accurate.

Yet while reliable MS time series prediction has many important applications and is often the intended outcome, published literature usually considers SS time series prediction. The main reason for this is the increased difficulty of the problems requiring MS prediction and the fact that the results obtained by simple extensions of techniques developed for SS prediction are often disappointing. Moreover, if many different techniques perform rather similarly on SS prediction problems, significant differences show up when extensions of these techniques are carried out on MS problems.

In this paper we will quickly go over the modelling approaches of MS prediction then existing work concerning the use of neural networks for MS prediction will be presented. In Section 3, the ensemble methods will be reviewed before presenting the generic boosting algorithm and in Section 4, the related work on regression will be presented.

Next a definition of RNNs as well as the BPTT associated learning algorithm will be provided. The new boosting algorithm is described in Section 6. Finally, in Section 7, we will look at the results of experiments obtained on three different benchmark tests for SS prediction problems as well as those obtained on two benchmark tests for MS prediction problems, all of which showed an overall improvement in performance.

Section snippets

Modelling approaches for times series

The most common approach to dealing with a prediction problem can be traced back to [7] and consists in using a fixed number M of past values (a fixed-length time window sliding over the time series) when building the prediction:x(t)=[x(t),x(t-τ),,x(t-(M-1)τ)],xˆ(t+τ)=f(x(t)).

Most of the current work on SS prediction relies on a result in [8] showing that under several assumptions (among them the absence of noise) it is possible to obtain a perfect estimate of x(t + τ) according to Eqs. (1), (2)

Combining multiple learning methods

The combination of models (classifiers or regressors) is an effective way to improve the performance of the models [31]. The goal of the combination of models is to obtain a more precise estimate than that obtained by a single model. Several effective methods to improve the performance of a simple algorithm by combining the several models were put forward.

The combined methods can be classified into three different groups. The voting methods which include bagging algorithm [32] and the boosting

Boosting methods

The boosting methods are a part of the methods of model aggregation (see for example [38], [39]). These methods make it possible to supplement the initial constraints related to the selection of a model and apply it to base models where the results are at least slightly better than random guessing. The goal is to obtain an aggregate model in which the error is smaller than those obtained by individual models. The basic idea is to increase the diversity so as to maximize the cover of data space.

Recurrent neural networks

RNNs are characterized by the presence of cycles in the graph of interconnections, and are able to model temporal dependencies of unspecified duration between the inputs and the associated desired outputs, by using internal memory. The passage of information from one neuron to another through a connection is not instantaneous (one time step), unlike MLP, and the presence of the loops thus makes it possible to keep the influence of the information for a variable time period, theoretically

Boosting recurrent neural networks

Boosting is a general method for improving the accuracy of any given learning algorithm. It produces a final solution by combining rough and moderately inaccurate decisions offered by different classifiers, which is at least slightly better than random guessing. In boosting, the training set used for each classifier is produced (weighted) based on the performance of the earlier classifier(s) in the series. Therefore, samples incorrectly classified by previous classifiers in the series are more

Experimental results

The boosting algorithm described was used with the learning algorithm BPTT for SS and MS time series forecasting problems. We added several new results to a previous study [53] on SS problem and also added a new study on MS problem.

The first set of experiments was carried out in order to explore the performance of the constructive algorithm and to study the influence of parameter k on its behaviour. The algorithm is used on the sunspots time series and two Mackey-Glass time series (MG17 and

Conclusion and future work

We adapted boosting to the problem of learning time-dependencies in sequential data for predicting future values, adding a new parameter for tuning the boosting influence and using recurrent neural networks as “weak” regressors.

The experimental results that we have obtained show that the boosting algorithm actually improves upon the performance comparatively with the use of only one RNN.

We first compared our results on SS with the results obtained from other combination methods. Results

References (63)

  • D.E. Rumelhart et al.

    Learning Internal Representations by Error Propagation

  • R.J. Williams et al.

    A learning algorithm for continually running fully recurrent neural networks

    Neural Computation

    (1989)
  • R.E. Schapire

    The strength of weak learnability

    Machine Learning

    (1990)
  • G.U. Yule

    On a method of investigating periodicity in disturbed series with special reference to Wolfer’s sunspot numbers

    Philosophical Transactions of the Royal Society of London Series A

    (1927)
  • F. Takens

    Detecting Strange Attractors in Turbulence

  • A. Aussem

    Sufficient conditions for error backflow convergence in dynamical recurrent neural networks

    Neural Computation

    (2002)
  • B. Hammer et al.

    Recurrent neural networks with small weights implement definite memory machines

    Neural Computation

    (2003)
  • J. Vesanto, Using the SOM and Local Models in Time-Series Prediction, in: Proceedings of the Workshop on...
  • L. Chudy et al.

    Prediction of chaotic time-series using dynamic cell structures and local linear models

    Neural Network World

    (1998)
  • F. Gers, D. Eck, J. Schmidhuber, Applying LSTM to Time Series Predictable Through Time-Window Approaches, in:...
  • N.G. Pavlidis, D.K. Tasoulis, M.N. Vrahatis, Time Series Forecasting Methodology for Multiple-Step-Ahead Prediction,...
  • J. Walter, H. Ritter, K.J. Schulten, Non-linear Prediction with Self-organizing Feature Maps, in: Proceedings of the...
  • T. Martinez et al.

    Neural-gas network for vector quantization and its application to time-series prediction

    IEEE Transactions on Neural Networks

    (1993)
  • A.D. Back et al.

    Stabilization Properties of Multilayer Feedforward Networks with Time-Delays Synapses

  • E.A. Wan

    Time Series Prediction by Using a Connection Network with Internal Delay Lines

  • T. Czernichow et al.

    Short term electrical load forecasting with artificial neural networks

    Engineering Intelligent Systems

    (1996)
  • T. Lin et al.

    Learning long-term dependencies in NARX recurrent neural networks

    IEEE Transactions on Neural Networks

    (1996)
  • S. El Hihi et al.

    Hierarchical Recurrent Neural Networks for Long-Term Dependencies

  • R. Boné, M. Crucianu, An Evaluation of Constructive Algorithms for Recurrent Networks on Multi-Step-Ahead Prediction,...
  • A.F. Atiya et al.

    A comparison between neural network forecasting techniques – Case study: River flow forecasting

    IEEE Transactions on Neural Networks

    (1999)
  • J.A.K. Suykens et al.

    Learning a simple recurrent neural state space model to behave like Chua’s double scroll

    IEEE Transactions on Circuits and Systems-I

    (1995)
  • Cited by (129)

    • Forecasting stock index price using the CEEMDAN-LSTM model

      2021, North American Journal of Economics and Finance
    • Majority voting ensemble with a decision trees for business failure prediction during economic downturns

      2021, Journal of Innovation and Knowledge
      Citation Excerpt :

      The fundamental idea of the combination model is to use the strength of each model to estimate different patterns in the data. Hence, the classifier ensemble systems, which combine several classifiers, achieve higher classification accuracy and efficiency than the original single classifiers (Assaad et al., 2008; Cho et al., 2009; Alfaro-Cortés, Garcia, Gamez, & Elizondo, 2007; Hua et al., 2007; Kim & Kang, 2010; Kittler, 1998; Lee et al., 2006; Maqsood et al., 2004; Optiz & Maclin, 1999; Perrone, Cowan, Tesauro, & Alspector, 1993; West et al., 2005; Yim & Mitchelle, 2002). For ensemble learning, it has been proven that better performance can be achieved by rejecting weak classifiers (Yang, 2011).

    View all citing articles on Scopus
    View full text