Elsevier

Pattern Recognition

Volume 44, Issue 2, February 2011, Pages 295-306
Pattern Recognition

A variational Bayesian methodology for hidden Markov models utilizing Student's-t mixtures

https://doi.org/10.1016/j.patcog.2010.09.001Get rights and content

Abstract

The Student's-t hidden Markov model (SHMM) has been recently proposed as a robust to outliers form of conventional continuous density hidden Markov models, trained by means of the expectation–maximization algorithm. In this paper, we derive a tractable variational Bayesian inference algorithm for this model. Our innovative approach provides an efficient and more robust alternative to EM-based methods, tackling their singularity and overfitting proneness, while allowing for the automatic determination of the optimal model size without cross-validation. We highlight the superiority of the proposed model over the competition using synthetic and real data. We also demonstrate the merits of our methodology in applications from diverse research fields, such as human computer interaction, robotics and semantic audio analysis.

Introduction

The hidden Markov model (HMM) is increasingly being adopted in applications since it provides a convenient way of modeling observations appearing in a sequential manner and tending to cluster or to alternate between different possible components (subpopulations). Specifically, HMMs with continuous observation densities have been used in a wide spectrum of applications in ecology, encryption, image understanding, speech recognition, and machine vision applications [1]. The hidden observation densities associated with each state of a continuous HMM must be capable of approximating arbitrarily complex probability density functions. Finite Gaussian mixture models (GMMs) are the most common selection of emission distribution models in the continuous HMM literature [2]. Their popularity stems from the well-known capability of GMMs to successfully approximate unknown random distributions, including distributions with multiple modes, while also providing a simple and computationally efficient maximum-likelihood (ML) estimation framework using the expectation–maximization (EM) algorithm [3]. Nevertheless, GMMs do also suffer from a significant drawback concerning their parameters estimation procedure, which is well-known to be adversely affected by the presence of outliers in the datasets used for the model fitting.

To tackle these issues, we have proposed in [4] a novel form of continuous HMMs where the hidden state distributions are modeled using finite mixtures of multivariate Student's-t densities. The multivariate Student's-t distribution is a bell-shaped distribution with heavier tails compared to the Gaussian; as a consequence, Student's-t mixture models (SMMs) provide an alternative to GMMs means of probabilistic generative modeling with high robustness to training data outliers. The so-obtained Student's-t hidden Markov model (SHMM) has been considered in [4] under the ML paradigm using the EM algorithm; as it has been shown, the SHMM provides an effective, computationally efficient and application-independent means for outlier tolerant representation and classification of sequential data by means of continuous HMMs.

In this paper, we provide an alternative treatment of the SHMM under a Bayesian framework using a variational approximation, yielding the variational Bayesian SHMM (VB-SHMM). Variational Bayesian treatments of statistical models present significant advantages over ML-based alternatives: ML approaches have the undesirable property of being ill-posed since the likelihood function is unbounded from above [5], [6], [7]. This fact results in several very significant shortcomings. To begin with, a significant difficulty concerns the infinities which plague the likelihood function, associated with the collapsing of the bell-shaped component distributions onto individual data points and, hence, resulting in singular or near-singular covariance matrices [7]. Obviously, the adoption of a Bayesian model inference algorithm, providing posterior distributions over the model parameters instead of point-estimates, would allow for the natural resolution of these issues [5], [6], [7]. Another central issue ML treatments of generative models are confronted with concerns selection of the optimal model size. Maximum likelihood is unable to address this issue since it favors models of ever-increasing complexity, thus leading to overfitting [17], [10].

In our work, we conduct a Bayesian treatment of the SHMM, overcoming the problems of ML approaches elegantly, by marginalizing over the model parameters with respect to appropriate priors. The resulting model (marginal) likelihood can then be maximized with respect to the model size, in case one aims at optimal model selection, or combined with a prior over the model size if the goal is model averaging [17], [16]. Our novel approach is based on variational approximation methods [8], which have recently emerged as a deterministic alternative to Markov chain Monte-Carlo (MCMC) algorithms for doing Bayesian inference for probabilistic generative models [9], [10], with better scalability in terms of computational cost [11]. Variational Bayesian inference has previously been applied to relevance vector machines [12], GMMs [13], autoregressive models [14], [15], SMMs [16], [17], mixtures of factor analyzers [18], [19], [20], discrete HMMs [21], Gaussian HMMs [22], as well as HMMs with Poisson and autoregressive observation models [23], thereby ameliorating the singularity and overfitting problems of ML approaches.

The remainder of this paper is organized as follows: In Section 2, a brief review of the SHMM is provided. In Section 3, the proposed variational Bayesian treatment of the SHMM is carried out, yielding the variational Bayesian SHMM algorithm. In Section 4, the experimental evaluation of the proposed algorithm is conducted, considering a series of data modeling and classification applications and using real-world datasets. In the final section, our results are summarized and discussed.

Section snippets

The Student's-t HMM

Let us suppose an N-state HMM where the hidden emission density of each state is modeled by a K-component finite mixture model. Considering that the component distributions of the K-component finite mixture models modeling the HMM state densities are multivariate Student's-t distributions, the definition of the Student's-t HMM is obtained. The pdf of a d-dimensional Student's-t distribution with mean μ, precision R, and ν degrees of freedom is given byt(xt|μ,R,ν)=Γν+d2|R|1/2(πν)d/2Γ(ν/2){1+MD(x

Variational Bayesian inference for the SHMM

Variational Bayesian inference for the SHMM comprises introduction of a set of prior distributions over the model parameters and further maximization of the log marginal likelihood (log evidence) of the resulting model. For convenience, we choose priors conjugate to the considered observable and latent data, as this selection greatly simplifies inference and interpretability [8]. This way, the prior for the initial-state probabilities vector is chosen to follow a Dirichlet distribution p(π)=D(π|

Experimental evaluation

In this section, we provide a thorough experimental evaluation of the VB-SHMM algorithm, in a series of sequential data modeling applications from diverse domains. Our experiments have been developed in Matlab R2008a, and were executed on a Macintosh platform with an Intel Core 2 Duo 2 GHz CPU, and 2 GB RAM, running Mac OS X 10.5.

Discussion

Hidden Markov models are a well-established technique for sequential data modeling and classification. Typically, HMMs with continuous observation distributions employ Gaussian mixture models as their hidden state densities. Nevertheless, this selection might considerably undermine the HMM performance when noise contaminates the training data, due to the well-known intolerance of GMMs to outliers. To mitigate this shortcoming, the replacement of Gaussian mixture models with finite mixture

Sotirios P. Chatzis received the M.Eng. in Electrical and Computer Engineering from the National Technical University of Athens, in 2005, and the Ph.D. in Machine Learning, in 2008, from the same institution. From January 2009 till June 2010 he was a Postdoctoral Fellow with the University of Miami, USA. Currently, he is with the Department of Electrical and Electronic Engineering, Imperial College, London. His current research interests are in the field of statistical machine learning, with a

References (33)

  • K. Yamazaki et al.

    Singularities in mixture models and upper bounds of stochastic complexity

    Neural Networks

    (2003)
  • C. Archambeau et al.

    Robust Bayesian clustering

    Neural Networks

    (2007)
  • M. Svensén et al.

    Robust Bayesian mixture modelling

    Neurocomputing

    (2005)
  • M. Kudo et al.

    Multidimensional curve classification using passing-through regions

    Pattern Recognition Letters

    (1999)
  • O. Cappé, E. Moulines, T. Rydén, Inference in Hidden Markov Models. Springer Series in Statistics, New York,...
  • L.R. Rabiner

    A tutorial on hidden Markov models and selected applications in speech recognition

    Proceedings of the IEEE

    (1989)
  • A. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    Journal of the Royal Statistical Society, B

    (1977)
  • S. Chatzis et al.

    Robust sequential data modeling using an outlier tolerant hidden Markov model

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • C. Archambeau, J.A. Lee, M. Verleysen, On the convergence problems of the EM algorithm for finite Gaussian mixtures,...
  • G. McLachlan, D. Peel, Finite Mixture Models, Wiley Series in Probability and Statistics, New York,...
  • C.M. Bishop

    Pattern Recognition and Machine Learning

    (2006)
  • J. Diebolt et al.

    Estimation of finite mixture distributions through Bayesian sampling

    Journal of the Royal Statistical Society Series B

    (1994)
  • S. Richardson et al.

    On Bayesian analysis of mixtures with unknown number of components

    Journal of the Royal Statistical Society Series B

    (1997)
  • M.I. Jordan et al.

    An introduction to variational methods for graphical models

  • C.M. Bishop, M.E. Tipping, Variational relevance vector machines, in: Proceedings of the 16th Conference on Uncertainty...
  • C. Constantinopoulos et al.

    Unsupervised learning of Gaussian mixtures based on variational component splitting

    IEEE Transactions on Neural Networks

    (2007)
  • Cited by (0)

    Sotirios P. Chatzis received the M.Eng. in Electrical and Computer Engineering from the National Technical University of Athens, in 2005, and the Ph.D. in Machine Learning, in 2008, from the same institution. From January 2009 till June 2010 he was a Postdoctoral Fellow with the University of Miami, USA. Currently, he is with the Department of Electrical and Electronic Engineering, Imperial College, London. His current research interests are in the field of statistical machine learning, with a focus on sparse Bayesian classifiers, transfer learning, reinforcement learning, preference learning, and their applications to long-term human–robot interaction.

    Dimitrios I. Kosmopoulos received the B.Sc. in Electrical and Computer Engineering in 1997 from the National Technical University of Athens and the Ph.D. degree in 2002 from the same institution. He has worked in many research and industrial projects in the field of robotics and computer vision. Before joining NCSR Demokritos as a Research Scientist, he was employed in OMRON (Japan), Federal Institute of Physics and Metrology and Inos Automations software (Germany). He is also an Adjunct Assistant Professor in the University of Central Greece and in the Technical Educational Institute of Athens. He has also served as an Adjunct Lecturer in the University of Peloponnese.

    View full text