Multivariate time series modeling and classification via hierarchical VAR mixtures

doi:10.1016/j.csda.2006.03.002

Computational Statistics & Data Analysis

Volume 51, Issue 3, 1 December 2006, Pages 1445-1462

https://doi.org/10.1016/j.csda.2006.03.002 Get rights and content

Abstract

A novel class of models for multivariate time series is presented. We consider hierarchical mixture-of-expert (HME) models in which the experts, or building blocks of the model, are vector autoregressions (VAR). It is assumed that the VAR-HME model partitions the covariate space, specifically including time as a covariate, into overlapping regions called overlays. In each overlay a given number of VAR experts compete with each other so that the most suitable one for the overlay is favored by a large weight. The weights have a particular parametric form that allows the modeler to include relevant covariates. Estimation of the model parameters is achieved via the EM (expectation–maximization) algorithm. A new algorithm to select the optimal number of overlays, the number of VAR models and the model orders of the VARs that define a particular VAR-HME model configuration, is also developed. The algorithm uses the Bayesian information criterion (BIC) as an optimality criterion. Issues of model checking and inference of latent structure in multiple time series are investigated. The new methodology is illustrated by analyzing a synthetic data set and a 7-channel electroencephalogram data set.

Introduction

We propose a multivariate time series modeling approach based on the idea of mixing models through the paradigm known as hierarchical mixture-of-experts (HME) (Jordan and Jacobs, 1994). The HME approach easily allows for model mixing and permits the representation of the mixture weights as a function of time or other covariates. Our HME models assume that the components of the mixture are vector autoregressions (VAR). These models provide useful insight into the spatio-temporal characteristics of the data by modeling multiple time series jointly. In addition, the VAR-HME models can assess, in a probabilistic fashion, the different states of the multivariate time series over time by means of the estimated mixture weights.

Developments on univariate HME time series models can be found in Huerta et al. (2003). These authors show how to estimate the parameters of mixture-of-expert (ME) and HME models for univariate time series via the expectation-maximization (EM) algorithm and Markov chain Monte Carlo (MCMC) methods. Huerta et al. (2003) applied the HME methodology to model monthly US industrial production index from 1947 to 1993. Specifically, a HME model to discriminate between stochastic trend models and deterministic trend models was considered. In this analysis time was the only covariate included in the model. More recently, Villagran and Huerta (2006) showed that the inclusion of additional covariates leads to substantial changes in the estimates of some of the model parameters in univariate mixture-of-expert models. In particular, the authors consider ME models for stochastic volatility in a time series of returns where time and the Dow Jones index are both covariates.

We present an extension of the HME developments of Huerta et al. (2003) to handle multivariate time series. We propose a novel class of models in which the mixture components, usually called experts in the neural network terminology, are vector autoregressions. VAR-HME models extend the univariate mixture of autoregressive (AR) models presented in Wong and Li (2000) and Wong and Li (2001) to the multivariate framework. Related univariate models, in which single-layer stochastic neural networks are used to model non-linear time series, are also developed in Lai and Wong (2001). The hierarchical structure of the VAR-HME models developed here allows the construction of very flexible models to describe the non-stationarities and non-linearities often present in multiple time series. Such hierarchical structure is not present in the univariate models developed in Wong and Li (2000), Wong and Li (2001) and Lai and Wong (2001). In addition, we design and discuss an algorithm for selecting an optimal VAR-HME model configuration. A particular VAR-HME configuration is defined by the number of components in the hierarchical-mixture, i.e., the number of overlays and experts (or VARs), and the model orders of each VAR. Our algorithm uses the Bayesian information criterion (BIC) as an optimality criterion.

The time series applications that motivate the VAR-HME modeling approach arise mainly in the area of biomedical signal processing, where multiple time series have two main characteristics. First, the series consist of multiple signals recorded simultaneously from a system under certain conditions. Second, each individual signal has an underlying structure possibly, but not necessarily, quasi-periodic, that can adequately be modeled by a collection of univariate AR models, or by AR models with parameters that vary over time (TVAR). These are the characteristics of the multi-channel electroencephalogram (EEG) data analyzed in Section 4.2. The VAR-HME models constitute a new class of multivariate time series models that are non-linear and non-stationary and so, they are suitable for modeling highly complex and non-stationary signals such as EEG traces. It is important to emphasize that the multivariate nature of the VAR-HME models developed here is a key feature. These models are able to capture latent process that are common to several univariate time series by modeling them jointly. This could not be achieved by analyzing each series separately via univariate mixtures of AR models. Other potential areas of application for these models include seismic and speech signal processing and applications to environmental and financial data analysis.

The paper is organized as follows. Section 2 presents the mathematical formulation of the VAR-HME models and summarizes the EM algorithm for parameter estimation when the number of overlays, the number of models and the model orders of the VAR components are known. Section 3 describes an algorithm for selecting the number of overlays, models and model orders of the VARs using the Bayesian information criterion or BIC. Model checking issues are also discussed in Section 3. Section 4 presents the analyses of two datasets: a simulated data set and a 7-channel electroencephalogram data set. Finally, conclusions and future work are presented in Section 5.

Section snippets

Models and methodology

Let ${\{y_{t}\}}_{1}^{T}$ be a collection of T k-dimensional time series vectors, and let ${\{x_{t}\}}_{1}^{T}$ be a collection of T l-dimensional vectors of covariates indexed in time. Let the conditional probability density function (pdf) of $y_{t}$ be $f_{t} (y_{t} | F_{t - 1}, X_{T}; θ)$ , where $θ$ is a parameter vector; $X_{T}$ is the $σ$ -field generated by ${\{x_{t}\}}_{1}^{T}$ representing external information; and, for each t, $F_{t - 1}$ is the $σ$ -field generated by ${\{y_{s}\}}_{1}^{t - 1}$ representing the previous history at time $t - 1$ . Typically, the conditional pdf $f_{t}$ is assumed to depend on $X_{T}$

Model selection

We now describe a general algorithm that searches for the optimal VAR-HME for a given data set. Optimality here will be defined in terms of the Bayesian information criterion or BIC (Schwarz, 1978). In other words, the optimal VAR-HME model configuration $M$ , will be the one whose number or overlays, number of expert models, VAR model orders and associated parameter values minimize the BIC, which is defined in this framework as $BIC (M, θ_{M}) = - 2 L_{t_{0} : t_{1}} (θ_{M}) + dim (θ_{M}) \log ({kT}^{*}),$ where $dim (θ_{M})$ is the dimension

Applications

In order to show the performance of the proposed models and search algorithm, two examples are considered. In the first example we analyze a simulated bivariate time series. In the second example we apply the VAR-HME models to a 7-channel electroencephalogram data recorded during electroconvulsive therapy.

Conclusions and future directions

This paper provides advancement in multivariate time series methodology that combines hierarchical mixtures and vector autoregressive components. The VAR-HME models presented here constitute a time-domain approach to analyzing multivariate non-stationary time series. A multivariate frequency-domain approach appears in Ombao et al. (2005). One of the key features of our time-domain approach is that the estimates of the gating functions (i.e., the functions that define the weights of the mixture

References (23)

A. Carvahlo et al.
Mixtures-of-experts of autoregressive time series: Asymptotic normality and model specification
IEEE Trans. Neural Networks
(2005)
S. Chib et al.
Marginal likelihood from the Metropolis-Hastings output
J. Amer. Statist. Assoc.
(2001)
O. Elerian et al.
Likelihood inference for discretely observed non-linear difussions
Econometrica
(2001)
P.J. Green
Reversible jump Markov Chain Monte Carlo computation and Bayesian model determination
Biometrika
(1995)
G. Huerta et al.
Discussion article: mixtures of time series models
J. Comput. Graphical Statist.
(2001)
G. Huerta et al.
Time series modeling via hierarchical mixtures
Statist. Sin.
(2003)
Huerta, G., Prado, R., 2006. Structure priors for multivariate time series. J. Statist. Plann. Inference, in...
M.I. Jordan et al.
Hierarchical mixtures of experts and the EM algorithm
Neural Comput.
(1994)
S. Kim et al.
Stochastic volatility: likelihood inference and comparison with ARCH models
Rev. Econom. Stud.
(1998)
T.L. Lai et al.
Stochastic neural networks with applications to nonlinear time series
J. Amer. Statist. Assoc.
(2001)

P. McCullagh et al.

Generalized Linear Models

(1983)

Cited by (0)

View full text

Computational Statistics & Data Analysis

Multivariate time series modeling and classification via hierarchical VAR mixtures

Abstract

Introduction

Section snippets

Models and methodology

Model selection

Applications

Conclusions and future directions

Mixtures-of-experts of autoregressive time series: Asymptotic normality and model specification

IEEE Trans. Neural Networks

Marginal likelihood from the Metropolis-Hastings output

J. Amer. Statist. Assoc.

Likelihood inference for discretely observed non-linear difussions

Econometrica

Reversible jump Markov Chain Monte Carlo computation and Bayesian model determination

Biometrika

Discussion article: mixtures of time series models

J. Comput. Graphical Statist.

Time series modeling via hierarchical mixtures

Statist. Sin.

Hierarchical mixtures of experts and the EM algorithm

Neural Comput.

Stochastic volatility: likelihood inference and comparison with ARCH models

Rev. Econom. Stud.

Stochastic neural networks with applications to nonlinear time series

J. Amer. Statist. Assoc.

Generalized Linear Models