Elsevier

Neurocomputing

Volume 28, Issues 1–3, October 1999, Pages 145-156
Neurocomputing

Time-series forecasting through wavelets transformation and a mixture of expert models

https://doi.org/10.1016/S0925-2312(98)00120-9Get rights and content

Abstract

This paper describes a system formed by a mixture of expert models (MEM) for time-series forecasting. We deal with several different competing models, such as partial least squares, K-nearest neighbours and carbon copy. The input space, after changing its base using the Haar wavelets transform, is partitioned into disjoint regions by a clustering algorithm. For each region, a benchmark is performed among the different competing models aiming at selecting the most adequate one. MEM has improved the forecast performance when compared with the single models as experimentally demonstrated through two different time series: laser data and exchange rate data.

Introduction

Many potential applications of predictive models are related with forecasting discrete time series. A discrete time series is a finite or infinite enumerable set of observations of a given variable z(t) ordered according to the parameter time, and denoted as z1,z2,…,zN, where N is the size of the time series. A key assumption frequently adopted in time-series forecast is that the statistical properties of the data generator are time independent. The goal is, given part of a time series zt,zt−1,…,zt−w+1 (named pattern), to predict a future value zt+h, where w is the length of the window on the time series and h≥1 is named the prediction horizon. We will focus on models that are able to explore the regularities of the patterns observed in the past for accurately predicting the short evolution of the system (h small).

Earlier research efforts on time-series forecast focused on linear models typified by ARMA models. Two crucial developments appeared around 1980 [4]:

  • 1.

    the state-space reconstruction paradigm by time-delay embedding, drew on ideas from differential topology and dynamical systems to provide a technique for recognizing when a time series has been generated by deterministic governing equations and, if so, for understanding the geometrical structure underlying the observed behaviour;

  • 2.

    the second development was the emergence of the field of machine learning, typified by neural networks, that can adaptively explore a large space of potential models [1], [9].

In the first case, where an experimentally observed quantity arises from deterministic equations, the idea is to use time-delay embedding to recover a representation of the relevant internal degrees of freedom of the system from the observable [4]. Although the precise values of these reconstructed variables are not meaningful (because of the unknown change of coordinates), they can be used to make precise forecasts since the embedded maps preserve their geometric structure.

In the second case, the idea is to make function approximations by using an adaptive model, such as a connectionist network, to learn to emulate the input–output behavior. By moving a window along the discrete time series, we can create a training data set consisting of many sets of input values (patterns) with the corresponding target values. Once the adaptive model has been trained, it can be presented with a pattern of observed values zt,zt−1,…,zt−w+1 and used to make a prediction for zt+h.

One lesson from all the research already developed on time-series forecast is that there is no one method universally superior to another. In this paper, we describe a method based on a mixture of expert models (MEM) for time-series forecasting. We consider the problem of learning a mapping in which the form of the mapping is different for different regions of the input space. The idea is to construct a specific predictive model for each input space region. To improve the quality of the information available to the models, we perform first a wavelet transformation of the time-series data.

Next, we describe how the paper is organized. Section 2 describes the MEM method. An extensive description of partial least squares (PLS) is included because it has been intensively used in our experiments. Section 3 contains experimental results on two time series distributed by the Santa Fe Institute [11]. Concluding remarks are given in Section 4.

Section snippets

Methodology

In this section we present MEM, a method for time-series forecasting based on a mixture of expert models.

MEM focuses on the problem of learning a mapping in which the form of the mapping is different for different regions of the input space. Although a single homogeneous adaptive model could be applied to this problem, we might expect that the task would be better performed if we assign different “expert” models to tackle each of the different regions, and then use an extra “gating” model,

Experimental results

We have tested the MEM method on two time series taken from the Santa Fe Time Series Prediction Analysis Competition, held during the fall of 1990 under the auspices of the Santa Fe Institute [4], [11].

Concluding remarks

We have presented a system formed by a mixture of expert models (MEM) for time-series forecasting. Since there is no single predictive method universally superior to the others, we have expanded previous implementations of MEM by allowing different types of predictive models to participate in its constitution. After performing a base change with the Haar wavelets transform, the input space was partitioned into disjoint regions by a clustering algorithm, and for each region a benchmark was

Ruy Luiz Milidiú received the Ph.D. degree in operations research from the University of California, Berkeley. He is currently an Assistant Professor in the Informatics Department at the Pontifı́cia Universidade Católica do Rio de Janeiro, Brazil, where he also coordinates the Algorithms Engineering and Neural Networks Lab. His research activity is in data compression, neural networks and systems optimization.

References (11)

  • P. Geladi et al.

    Partial least squares regression: a tutorial

    Analy. Chim. Acta.

    (1986)
  • C.M. Bishop, Neural Networks for Pattern Recognition, Clarendon Press, Oxford,...
  • P.A. Devijver, J. Kittler, Pattern Recognition: a Statistical Approach, Prentice-Hall, Englewood Cliffs, NJ,...
  • R. Gnanadesikan, Methods for Statistical Data Analysis of Multivariate Observations, Wiley, New York,...
  • A.S. Weigend, N.A. Gershenfeld, Forecasting the Future and Understanding the Past, Addison-Wesley, Reading, MA,...
There are more references available in the full text version of this article.

Cited by (45)

  • Anomaly Detection by Recombining Gated Unsupervised Experts

    2022, Proceedings of the International Joint Conference on Neural Networks
View all citing articles on Scopus

Ruy Luiz Milidiú received the Ph.D. degree in operations research from the University of California, Berkeley. He is currently an Assistant Professor in the Informatics Department at the Pontifı́cia Universidade Católica do Rio de Janeiro, Brazil, where he also coordinates the Algorithms Engineering and Neural Networks Lab. His research activity is in data compression, neural networks and systems optimization.

Ricardo Machado received a D.Sc. in Computer Science from the Federal University of Rio de Janeiro, Brazil, in 1985. Until his untimely passing away in late 1997 he was with the Algorithms Engineering and Neural Networks Lab at the Catholic University of Rio de Janeiro, Brazil. Prior to that he was with the IBM Rio Scientific Center, where he conducted research on neural networks applications.

Raúl Renterı́a graduated in computer engineering in 1996 from the Pontifı́cia Universidade Católica of Rio de Janeiro, Brazil. He is currently pursuing the M.S. degree in the Informatics Department there, where he is also a research assistant at the Algorithms Engineering and Neural Networks Lab.

1

Deceased.

View full text