Time-series forecasting through wavelets transformation and a mixture of expert models

doi:10.1016/S0925-2312(98)00120-9

Neurocomputing

Volume 28, Issues 1–3, October 1999, Pages 145-156

https://doi.org/10.1016/S0925-2312(98)00120-9 Get rights and content

Abstract

This paper describes a system formed by a mixture of expert models (MEM) for time-series forecasting. We deal with several different competing models, such as partial least squares, K-nearest neighbours and carbon copy. The input space, after changing its base using the Haar wavelets transform, is partitioned into disjoint regions by a clustering algorithm. For each region, a benchmark is performed among the different competing models aiming at selecting the most adequate one. MEM has improved the forecast performance when compared with the single models as experimentally demonstrated through two different time series: laser data and exchange rate data.

Introduction

Many potential applications of predictive models are related with forecasting discrete time series. A discrete time series is a finite or infinite enumerable set of observations of a given variable z(t) ordered according to the parameter time, and denoted as $z_{1}, z_{2}, …, z_{N}$ , where N is the size of the time series. A key assumption frequently adopted in time-series forecast is that the statistical properties of the data generator are time independent. The goal is, given part of a time series $z_{t}, z_{t−1}, …, z_{t−w+1}$ (named pattern), to predict a future value z_t+h, where w is the length of the window on the time series and h≥1 is named the prediction horizon. We will focus on models that are able to explore the regularities of the patterns observed in the past for accurately predicting the short evolution of the system (h small).

Earlier research efforts on time-series forecast focused on linear models typified by ARMA models. Two crucial developments appeared around 1980 [4]:

1.
the state-space reconstruction paradigm by time-delay embedding, drew on ideas from differential topology and dynamical systems to provide a technique for recognizing when a time series has been generated by deterministic governing equations and, if so, for understanding the geometrical structure underlying the observed behaviour;
2.
the second development was the emergence of the field of machine learning, typified by neural networks, that can adaptively explore a large space of potential models [1], [9].

In the first case, where an experimentally observed quantity arises from deterministic equations, the idea is to use time-delay embedding to recover a representation of the relevant internal degrees of freedom of the system from the observable [4]. Although the precise values of these reconstructed variables are not meaningful (because of the unknown change of coordinates), they can be used to make precise forecasts since the embedded maps preserve their geometric structure.

In the second case, the idea is to make function approximations by using an adaptive model, such as a connectionist network, to learn to emulate the input–output behavior. By moving a window along the discrete time series, we can create a training data set consisting of many sets of input values (patterns) with the corresponding target values. Once the adaptive model has been trained, it can be presented with a pattern of observed values $z_{t}, z_{t−1}, …, z_{t−w+1}$ and used to make a prediction for z_t+h.

One lesson from all the research already developed on time-series forecast is that there is no one method universally superior to another. In this paper, we describe a method based on a mixture of expert models (MEM) for time-series forecasting. We consider the problem of learning a mapping in which the form of the mapping is different for different regions of the input space. The idea is to construct a specific predictive model for each input space region. To improve the quality of the information available to the models, we perform first a wavelet transformation of the time-series data.

Next, we describe how the paper is organized. Section 2 describes the MEM method. An extensive description of partial least squares (PLS) is included because it has been intensively used in our experiments. Section 3 contains experimental results on two time series distributed by the Santa Fe Institute [11]. Concluding remarks are given in Section 4.

Section snippets

Methodology

In this section we present MEM, a method for time-series forecasting based on a mixture of expert models.

MEM focuses on the problem of learning a mapping in which the form of the mapping is different for different regions of the input space. Although a single homogeneous adaptive model could be applied to this problem, we might expect that the task would be better performed if we assign different “expert” models to tackle each of the different regions, and then use an extra “gating” model,

Experimental results

We have tested the MEM method on two time series taken from the Santa Fe Time Series Prediction Analysis Competition, held during the fall of 1990 under the auspices of the Santa Fe Institute [4], [11].

Concluding remarks

We have presented a system formed by a mixture of expert models (MEM) for time-series forecasting. Since there is no single predictive method universally superior to the others, we have expanded previous implementations of MEM by allowing different types of predictive models to participate in its constitution. After performing a base change with the Haar wavelets transform, the input space was partitioned into disjoint regions by a clustering algorithm, and for each region a benchmark was

Ruy Luiz Milidiú received the Ph.D. degree in operations research from the University of California, Berkeley. He is currently an Assistant Professor in the Informatics Department at the Pontifı́cia Universidade Católica do Rio de Janeiro, Brazil, where he also coordinates the Algorithms Engineering and Neural Networks Lab. His research activity is in data compression, neural networks and systems optimization.

References (11)

P. Geladi et al.
Partial least squares regression: a tutorial
Analy. Chim. Acta.
(1986)
C.M. Bishop, Neural Networks for Pattern Recognition, Clarendon Press, Oxford,...
P.A. Devijver, J. Kittler, Pattern Recognition: a Statistical Approach, Prentice-Hall, Englewood Cliffs, NJ,...
R. Gnanadesikan, Methods for Statistical Data Analysis of Multivariate Observations, Wiley, New York,...
A.S. Weigend, N.A. Gershenfeld, Forecasting the Future and Understanding the Past, Addison-Wesley, Reading, MA,...

There are more references available in the full text version of this article.

Cited by (45)

Time series forecasting using hybrid arima and ann models based on DWT Decomposition
2015, Procedia Computer Science
Recently Discrete Wavelet Transform (DWT) has led to a tremendous surge in many domains of science and engineering. In this study, we present the advantage of DWT to improve time series forecasting precision. This article suggests a novel technique of forecasting by segregating a time series dataset into linear and nonlinear components through DWT. At first, DWT is used to decompose the in-sample training dataset of the time series into linear (detailed) and non-linear (approximate) parts. Then, the Autoregressive Integrated Moving Average (ARIMA) and Artificial Neural Network (ANN) models are used to separately recognize and predict the reconstructed detailed and approximate components, respectively. In this manner, the proposed approach tactically utilizes the unique strengths of DWT, ARIMA, and ANN to improve the forecasting accuracy. Our hybrid method is tested on four real-world time series and its forecasting results are compared with those of ARIMA, ANN, and Zhang's hybrid models. Results clearly show that the proposed method achieves best forecasting accuracies for each series.
An adaptive fuzzy c-means clustering-based mixtures of experts model for unlabeled data classification
2008, Neurocomputing
Compared with labeled data, unlabeled data are more readily available. Currently, classification of unlabeled data is an open issue, especially for the case of unknown class number. In this paper, we propose an adaptive fuzzy c-means (FCM)-based mixtures of experts model to deal with the problem. In this model, each mixture of experts (ME) consists of two expert networks and a gating network. Two experts, namely, Gaussian neural network (GNN) and sigmoid neural network (SNN), are selected as two candidates. Two phases are employed to construct the proposed model. First, the whole input space is partitioned into several clusters using the FCM clustering algorithm. The number of clusters can be determined adaptively by a cluster validity function. Second, the proposed model is trained by a small fraction of samples which are closer to their corresponding cluster centers. A numerical study is made on several synthetic and real-world data sets. Compared with the other four models, the proposed model exhibits better generalization ability in dealing with problems of unsupervised classification. The experimental results also show that the extension version of the proposed model for semi-supervised classification is comparable to the ${CVS}^{3} VM$ approach.
A COMPETITIVE LEARNING APPROACH FOR SPECIALIZED MODELS: A SOLUTION FOR COMPLEX PHYSICAL SYSTEMS WITH DISTINCT FUNCTIONAL REGIMES
2023, arXiv
Wavelet transform in stock prices forecasting and related financial data
2023, AIP Conference Proceedings
Anomaly Detection by Recombining Gated Unsupervised Experts
2022, Proceedings of the International Joint Conference on Neural Networks
DeepSense: A Physics-Guided Deep Learning Paradigm for Anomaly Detection in Soil Gas Data at Geologic CO<inf>2</inf>Storage Sites
2021, Environmental Science and Technology

View all citing articles on Scopus

Ricardo Machado received a D.Sc. in Computer Science from the Federal University of Rio de Janeiro, Brazil, in 1985. Until his untimely passing away in late 1997 he was with the Algorithms Engineering and Neural Networks Lab at the Catholic University of Rio de Janeiro, Brazil. Prior to that he was with the IBM Rio Scientific Center, where he conducted research on neural networks applications.

Raúl Renterı́a graduated in computer engineering in 1996 from the Pontifı́cia Universidade Católica of Rio de Janeiro, Brazil. He is currently pursuing the M.S. degree in the Informatics Department there, where he is also a research assistant at the Algorithms Engineering and Neural Networks Lab.

¹: Deceased.

View full text