A variational Expectation–Maximization algorithm for temporal data clustering

https://doi.org/10.1016/j.csda.2016.05.007Get rights and content

Abstract

The problem of temporal data clustering is addressed using a dynamic Gaussian mixture model. In addition to the missing clusters used in the classical Gaussian mixture model, the proposed approach assumes that the means of the Gaussian densities are latent variables distributed according to random walks. The parameters of the proposed algorithm are estimated by the maximum likelihood approach. However, the EM algorithm cannot be applied directly due to the complex structure of the model, and some approximations are required. Using a variational approximation, an algorithm called VEM-DyMix is proposed to estimate the parameters of the proposed model. Using simulated data, the ability of the proposed approach to accurately estimate the parameters is demonstrated. VEM-DyMix outperforms, in terms of clustering and estimation accuracy, other state-of-the-art algorithms. The experiments performed on real world data from two fields of application (railway condition monitoring and object tracking from videos) show the strong potential of the proposed algorithms.

Introduction

Cluster analysis, which consists in automatically identifying groups into data sets, remains a central issue in many applications including web data mining, marketing, bio-informatics, image segmentation and text mining. The Gaussian mixture model (GMM) (McLachlan and Peel, 2004, Titterington et al., 1985), used conjointly with the Expectation–Maximization (EM) algorithm (Dempster et al., 1977), is now well known to provide powerful clustering solutions. However, some challenges still remain for the processing of non-stationary data.

This study was motivated by the clustering of temporal data acquired on some critical railway components, for characterizing the dynamic of their degradations. Its final objective is to build a decision-aided support for their preventive maintenance. To solve this problem, we propose to automatically extract, from temporal data, clusters whose centers evolve over time.

The general situation, where at each time a set of multivariate observations is acquired, is considered in this article. Fig. 1 shows an example of such temporal data, where we have, for instance, three observations at t=1 and five observations at t=2.

One way to address this specific clustering problem is to assume that the data are distributed according to a Gaussian mixture model whose centers are linear functions of time (DeSarbo and Cron, 1988, Wedel and DeSarbo, 1995). However, a linear evolution of the clusters may turn out to be inefficient for complex nonlinear dynamics. For tracking time-varying spike shapes, Calabrese and Paninski (2011) have proposed a method which consists of maximizing the log-likelihood criterion associated to the classical Gaussian mixture model, penalized by a term that takes into account the temporal evolution of the clusters.

In this work, a dynamic latent variable model dedicated to temporal data clustering is introduced. The frequentist approach was adopted to estimate its parameters. Unfortunately, estimating the parameter of this model by the maximum likelihood approach via the EM algorithm is intractable. Difficulties arise in the E step due to the dynamic structure of the model, which requires integrations over all possible configurations of the hidden variables. Some approximations are therefore required.

Viewing EM as the alternate optimization of an auxiliary function, respectively with respect to the distribution over the latent variables and the parameters (Neal and Hinton, 1998), a variational approximation is proposed in this paper. The idea is to restrict the latter optimization problem to a family of distribution over latent variables, which can be factorized into independent factors. In this case, a lower bound of the log-likelihood criterion is maximized. Variational inference, which was initiated in the mid-1990s, is usually applied to complex models involving missing values or based on latent structures when the direct implementation of standard EM is difficult to be achieved (Jaakkola and Jordan, 1997, Jordan et al., 1998). It has been proved to provide relevant estimates of mixture models in different configurations (Govaert and Nadif, 2008).

Alternative methods can be used to tackle the maximum likelihood problem, such as stochastic versions of EM (see McLachlan and Krishnan, 2008, chap. 6). For instance, the SEM-Gibbs algorithm proposed by Keribin et al. (2010) runs a Gibbs sampler to simulate the unknown labels. However, we opted for a variational approximation, which is computationally more attractive.

The paper is organized as follows. Section  2 briefly reviews the penalized maximum likelihood approach of Calabrese and Paninski (2011). In Section  3, a dynamic model is formalized for clustering temporal data, and a new parameter estimation method based on a variational EM algorithm is presented. An incremental version of the proposed algorithm is formulated in Section  4. Experiments carried out on simulated and real data are presented in Section  5. Finally, conclusions and future works are proposed in Section  6.

The following notations will be used throughout this paper: x=(x1,,xT) denotes the sequence of T observed data to be classified, where xt is itself a sub-sample of nt multivariate observations (xt1,,xtnt), with xtiRdi=1,,nt. The unobserved classes associated to the observations will be denoted by z=(z1,,zT), where zt=(zt1,,ztnt), with zti{1,K}. The means of the Gaussian densities will be denoted as μ=(μk(t);t=1,,T,k=1,,K).

To simplify the notations, the sums and products relative to time, observations at each time and clusters will be subscripted respectively by the letters t, i, k without indicating the limits of variation. So, for instance, the sum t stands for t=1T, the sum i stands for i=1nt, t,i,k stands for t=1Ti=1ntk=1K and t,i,k stands for t=1Ti=1ntk=1K.

Section snippets

A penalized likelihood approach for temporal data clustering

This section gives a brief review of the temporal data clustering approach introduced by Calabrese and Paninski (2011), which consists in maximizing the log-likelihood criterion associated to the classical Gaussian mixture model (GMM) penalized by a term that takes into account the temporal evolution of the clusters. When several observations are acquired at each time t, the Calabrese and Paninski (2011) criterion can be modified as follows: L(θ)=t,ilogkπkφ(xti;μk(t),σk2I)+k,tlogφ(μk(t);μk(t

A dynamic probabilistic model for temporal data clustering

A more general model dedicated to temporal data clustering is proposed in this section. In contrast to the penalized likelihood approach described in the previous section, which assumes a prior distribution over the clusters centers, the new approach does not consider them as parameters but rather as random variables. After having introduced its generative formulation and discuss the model identifiability, a variational EM algorithm is developed for parameter estimation.

Sequential clustering algorithm

A recursive version of VEM-DyMix called OVEM-DyMix is introduced in this section for online learning of the parameters, which can be useful for incoming non-stationary data stream and large data sets. The unknown parameters are recursively estimated whenever new data becomes available.

Let us define the notations: τ1:t=(τjik;j=1,,t,i=1,,ni,k=1,,K),τt=(τtik;i=1,,nt,k=1,,K),m1:t=(mk(i);j=1,,t,k=1,,K),m(t)=(mk(t);k=1,,K). In order to derive the OVEM-DyMix algorithm, the function F defined

Experimental study

This section is devoted to the evaluation of the proposed algorithms on several synthetic data sets in terms of clustering, estimation accuracy and computation time. The proposed algorithms are compared to other state-of-the-art algorithms. Then, our algorithms are applied on two real data sets.

Conclusion and future work

A new approach dedicated to temporal data clustering has been proposed in this article. The dynamic model associated to this approach assumes that the clusters centers are latent random variables which evolve in the course of time according to random walks. As a direct application of the EM algorithm to maximize the log-likelihood is intractable, a variational approximation is proposed to estimate the parameters of the model. Moreover, the assumption of Gaussian classes leads to efficient

Acknowledgments

This work was conducted within the framework of DIADEM, a project supported by ANR. Furthermore, the authors would like to thank M. Marc Antoni of SNCF for the data he provided.

References (27)

  • A. Calabrese et al.

    Kalman filter mixture model for spike sorting of non-stationary data

    J. Neurosci. Methods

    (2011)
  • G. Govaert et al.

    Block clustering with bernoulli mixture models: Comparison of different approaches

    Comput. Statist. Data Anal.

    (2008)
  • P. Biemer
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (1977)
  • W.S. DeSarbo et al.

    A maximum likelihood methodology for clusterwise linear regression

    J. Classification

    (1988)
  • J. Durbin et al.

    Time Series Analysis by State Space Methods

    (2012)
  • Z. Ghahramani et al.

    Graphical models and variational methods

  • Z. Ghahramani et al.

    Variational learning for switching state-space models

    Neural Comput.

    (2000)
  • A.C. Harvey

    Forecasting, Structural Time Series Models and the Kalman Filter

    (1990)
  • T.S. Jaakkola et al.

    Variational methods for inference and estimation in graphical models

    (1997)
  • J.E. Jackson

    A User’s Guide to Principal Components, Vol. 587

    (2005)
  • M.I. Jordan et al.

    An Introduction to Variational Methods for Graphical Models

    (1998)
  • M.I. Jordan et al.

    An introduction to variational methods for graphical models

    Mach. Learn.

    (1999)
  • Cited by (0)

    View full text