Multivariate time series modeling and classification via hierarchical VAR mixtures
Introduction
We propose a multivariate time series modeling approach based on the idea of mixing models through the paradigm known as hierarchical mixture-of-experts (HME) (Jordan and Jacobs, 1994). The HME approach easily allows for model mixing and permits the representation of the mixture weights as a function of time or other covariates. Our HME models assume that the components of the mixture are vector autoregressions (VAR). These models provide useful insight into the spatio-temporal characteristics of the data by modeling multiple time series jointly. In addition, the VAR-HME models can assess, in a probabilistic fashion, the different states of the multivariate time series over time by means of the estimated mixture weights.
Developments on univariate HME time series models can be found in Huerta et al. (2003). These authors show how to estimate the parameters of mixture-of-expert (ME) and HME models for univariate time series via the expectation-maximization (EM) algorithm and Markov chain Monte Carlo (MCMC) methods. Huerta et al. (2003) applied the HME methodology to model monthly US industrial production index from 1947 to 1993. Specifically, a HME model to discriminate between stochastic trend models and deterministic trend models was considered. In this analysis time was the only covariate included in the model. More recently, Villagran and Huerta (2006) showed that the inclusion of additional covariates leads to substantial changes in the estimates of some of the model parameters in univariate mixture-of-expert models. In particular, the authors consider ME models for stochastic volatility in a time series of returns where time and the Dow Jones index are both covariates.
We present an extension of the HME developments of Huerta et al. (2003) to handle multivariate time series. We propose a novel class of models in which the mixture components, usually called experts in the neural network terminology, are vector autoregressions. VAR-HME models extend the univariate mixture of autoregressive (AR) models presented in Wong and Li (2000) and Wong and Li (2001) to the multivariate framework. Related univariate models, in which single-layer stochastic neural networks are used to model non-linear time series, are also developed in Lai and Wong (2001). The hierarchical structure of the VAR-HME models developed here allows the construction of very flexible models to describe the non-stationarities and non-linearities often present in multiple time series. Such hierarchical structure is not present in the univariate models developed in Wong and Li (2000), Wong and Li (2001) and Lai and Wong (2001). In addition, we design and discuss an algorithm for selecting an optimal VAR-HME model configuration. A particular VAR-HME configuration is defined by the number of components in the hierarchical-mixture, i.e., the number of overlays and experts (or VARs), and the model orders of each VAR. Our algorithm uses the Bayesian information criterion (BIC) as an optimality criterion.
The time series applications that motivate the VAR-HME modeling approach arise mainly in the area of biomedical signal processing, where multiple time series have two main characteristics. First, the series consist of multiple signals recorded simultaneously from a system under certain conditions. Second, each individual signal has an underlying structure possibly, but not necessarily, quasi-periodic, that can adequately be modeled by a collection of univariate AR models, or by AR models with parameters that vary over time (TVAR). These are the characteristics of the multi-channel electroencephalogram (EEG) data analyzed in Section 4.2. The VAR-HME models constitute a new class of multivariate time series models that are non-linear and non-stationary and so, they are suitable for modeling highly complex and non-stationary signals such as EEG traces. It is important to emphasize that the multivariate nature of the VAR-HME models developed here is a key feature. These models are able to capture latent process that are common to several univariate time series by modeling them jointly. This could not be achieved by analyzing each series separately via univariate mixtures of AR models. Other potential areas of application for these models include seismic and speech signal processing and applications to environmental and financial data analysis.
The paper is organized as follows. Section 2 presents the mathematical formulation of the VAR-HME models and summarizes the EM algorithm for parameter estimation when the number of overlays, the number of models and the model orders of the VAR components are known. Section 3 describes an algorithm for selecting the number of overlays, models and model orders of the VARs using the Bayesian information criterion or BIC. Model checking issues are also discussed in Section 3. Section 4 presents the analyses of two datasets: a simulated data set and a 7-channel electroencephalogram data set. Finally, conclusions and future work are presented in Section 5.
Section snippets
Models and methodology
Let be a collection of T k-dimensional time series vectors, and let be a collection of T l-dimensional vectors of covariates indexed in time. Let the conditional probability density function (pdf) of be , where is a parameter vector; is the -field generated by representing external information; and, for each t, is the -field generated by representing the previous history at time . Typically, the conditional pdf is assumed to depend on
Model selection
We now describe a general algorithm that searches for the optimal VAR-HME for a given data set. Optimality here will be defined in terms of the Bayesian information criterion or BIC (Schwarz, 1978). In other words, the optimal VAR-HME model configuration , will be the one whose number or overlays, number of expert models, VAR model orders and associated parameter values minimize the BIC, which is defined in this framework as where is the dimension
Applications
In order to show the performance of the proposed models and search algorithm, two examples are considered. In the first example we analyze a simulated bivariate time series. In the second example we apply the VAR-HME models to a 7-channel electroencephalogram data recorded during electroconvulsive therapy.
Conclusions and future directions
This paper provides advancement in multivariate time series methodology that combines hierarchical mixtures and vector autoregressive components. The VAR-HME models presented here constitute a time-domain approach to analyzing multivariate non-stationary time series. A multivariate frequency-domain approach appears in Ombao et al. (2005). One of the key features of our time-domain approach is that the estimates of the gating functions (i.e., the functions that define the weights of the mixture
References (23)
- et al.
Mixtures-of-experts of autoregressive time series: Asymptotic normality and model specification
IEEE Trans. Neural Networks
(2005) - et al.
Marginal likelihood from the Metropolis-Hastings output
J. Amer. Statist. Assoc.
(2001) - et al.
Likelihood inference for discretely observed non-linear difussions
Econometrica
(2001) Reversible jump Markov Chain Monte Carlo computation and Bayesian model determination
Biometrika
(1995)- et al.
Discussion article: mixtures of time series models
J. Comput. Graphical Statist.
(2001) - et al.
Time series modeling via hierarchical mixtures
Statist. Sin.
(2003) - Huerta, G., Prado, R., 2006. Structure priors for multivariate time series. J. Statist. Plann. Inference, in...
- et al.
Hierarchical mixtures of experts and the EM algorithm
Neural Comput.
(1994) - et al.
Stochastic volatility: likelihood inference and comparison with ARCH models
Rev. Econom. Stud.
(1998) - et al.
Stochastic neural networks with applications to nonlinear time series
J. Amer. Statist. Assoc.
(2001)