A variational Expectation–Maximization algorithm for temporal data clustering
Introduction
Cluster analysis, which consists in automatically identifying groups into data sets, remains a central issue in many applications including web data mining, marketing, bio-informatics, image segmentation and text mining. The Gaussian mixture model (GMM) (McLachlan and Peel, 2004, Titterington et al., 1985), used conjointly with the Expectation–Maximization (EM) algorithm (Dempster et al., 1977), is now well known to provide powerful clustering solutions. However, some challenges still remain for the processing of non-stationary data.
This study was motivated by the clustering of temporal data acquired on some critical railway components, for characterizing the dynamic of their degradations. Its final objective is to build a decision-aided support for their preventive maintenance. To solve this problem, we propose to automatically extract, from temporal data, clusters whose centers evolve over time.
The general situation, where at each time a set of multivariate observations is acquired, is considered in this article. Fig. 1 shows an example of such temporal data, where we have, for instance, three observations at and five observations at .
One way to address this specific clustering problem is to assume that the data are distributed according to a Gaussian mixture model whose centers are linear functions of time (DeSarbo and Cron, 1988, Wedel and DeSarbo, 1995). However, a linear evolution of the clusters may turn out to be inefficient for complex nonlinear dynamics. For tracking time-varying spike shapes, Calabrese and Paninski (2011) have proposed a method which consists of maximizing the log-likelihood criterion associated to the classical Gaussian mixture model, penalized by a term that takes into account the temporal evolution of the clusters.
In this work, a dynamic latent variable model dedicated to temporal data clustering is introduced. The frequentist approach was adopted to estimate its parameters. Unfortunately, estimating the parameter of this model by the maximum likelihood approach via the EM algorithm is intractable. Difficulties arise in the E step due to the dynamic structure of the model, which requires integrations over all possible configurations of the hidden variables. Some approximations are therefore required.
Viewing EM as the alternate optimization of an auxiliary function, respectively with respect to the distribution over the latent variables and the parameters (Neal and Hinton, 1998), a variational approximation is proposed in this paper. The idea is to restrict the latter optimization problem to a family of distribution over latent variables, which can be factorized into independent factors. In this case, a lower bound of the log-likelihood criterion is maximized. Variational inference, which was initiated in the mid-1990s, is usually applied to complex models involving missing values or based on latent structures when the direct implementation of standard EM is difficult to be achieved (Jaakkola and Jordan, 1997, Jordan et al., 1998). It has been proved to provide relevant estimates of mixture models in different configurations (Govaert and Nadif, 2008).
Alternative methods can be used to tackle the maximum likelihood problem, such as stochastic versions of EM (see McLachlan and Krishnan, 2008, chap. 6). For instance, the SEM-Gibbs algorithm proposed by Keribin et al. (2010) runs a Gibbs sampler to simulate the unknown labels. However, we opted for a variational approximation, which is computationally more attractive.
The paper is organized as follows. Section 2 briefly reviews the penalized maximum likelihood approach of Calabrese and Paninski (2011). In Section 3, a dynamic model is formalized for clustering temporal data, and a new parameter estimation method based on a variational EM algorithm is presented. An incremental version of the proposed algorithm is formulated in Section 4. Experiments carried out on simulated and real data are presented in Section 5. Finally, conclusions and future works are proposed in Section 6.
The following notations will be used throughout this paper: denotes the sequence of observed data to be classified, where is itself a sub-sample of multivariate observations , with . The unobserved classes associated to the observations will be denoted by , where , with . The means of the Gaussian densities will be denoted as .
To simplify the notations, the sums and products relative to time, observations at each time and clusters will be subscripted respectively by the letters , , without indicating the limits of variation. So, for instance, the sum stands for , the sum stands for , stands for and stands for .
Section snippets
A penalized likelihood approach for temporal data clustering
This section gives a brief review of the temporal data clustering approach introduced by Calabrese and Paninski (2011), which consists in maximizing the log-likelihood criterion associated to the classical Gaussian mixture model (GMM) penalized by a term that takes into account the temporal evolution of the clusters. When several observations are acquired at each time , the Calabrese and Paninski (2011) criterion can be modified as follows:
A dynamic probabilistic model for temporal data clustering
A more general model dedicated to temporal data clustering is proposed in this section. In contrast to the penalized likelihood approach described in the previous section, which assumes a prior distribution over the clusters centers, the new approach does not consider them as parameters but rather as random variables. After having introduced its generative formulation and discuss the model identifiability, a variational EM algorithm is developed for parameter estimation.
Sequential clustering algorithm
A recursive version of VEM-DyMix called OVEM-DyMix is introduced in this section for online learning of the parameters, which can be useful for incoming non-stationary data stream and large data sets. The unknown parameters are recursively estimated whenever new data becomes available.
Let us define the notations: In order to derive the OVEM-DyMix algorithm, the function defined
Experimental study
This section is devoted to the evaluation of the proposed algorithms on several synthetic data sets in terms of clustering, estimation accuracy and computation time. The proposed algorithms are compared to other state-of-the-art algorithms. Then, our algorithms are applied on two real data sets.
Conclusion and future work
A new approach dedicated to temporal data clustering has been proposed in this article. The dynamic model associated to this approach assumes that the clusters centers are latent random variables which evolve in the course of time according to random walks. As a direct application of the EM algorithm to maximize the log-likelihood is intractable, a variational approximation is proposed to estimate the parameters of the model. Moreover, the assumption of Gaussian classes leads to efficient
Acknowledgments
This work was conducted within the framework of DIADEM, a project supported by ANR. Furthermore, the authors would like to thank M. Marc Antoni of SNCF for the data he provided.
References (27)
- et al.
Kalman filter mixture model for spike sorting of non-stationary data
J. Neurosci. Methods
(2011) - et al.
Block clustering with bernoulli mixture models: Comparison of different approaches
Comput. Statist. Data Anal.
(2008) - et al.
Maximum likelihood from incomplete data via the EM algorithm
J. R. Stat. Soc. Ser. B Stat. Methodol.
(1977) - et al.
A maximum likelihood methodology for clusterwise linear regression
J. Classification
(1988) - et al.
Time Series Analysis by State Space Methods
(2012) - et al.
Graphical models and variational methods
- et al.
Variational learning for switching state-space models
Neural Comput.
(2000) Forecasting, Structural Time Series Models and the Kalman Filter
(1990)- et al.
Variational methods for inference and estimation in graphical models
(1997)