Skip to main content
Log in

An online classification EM algorithm based on the mixture model

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Mixture model-based clustering is widely used in many applications. In certain real-time applications the rapid increase of data size with time makes classical clustering algorithms too slow. An online clustering algorithm based on mixture models is presented in the context of a real-time flaw-diagnosis application for pressurized containers which uses data from acoustic emission signals. The proposed algorithm is a stochastic gradient algorithm derived from the classification version of the EM algorithm (CEM). It provides a model-based generalization of the well-known online k-means algorithm, able to handle non-spherical clusters. Using synthetic and real data sets, the proposed algorithm is compared with the batch CEM algorithm and the online EM algorithm. The three approaches generate comparable solutions in terms of the resulting partition when clusters are relatively well separated, but online algorithms become faster as the size of the available observations increases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE PAMI 22, 719–725 (2000)

    Google Scholar 

  • Bottou, L.: Une approche théorique de l’apprentissage connexioniste; applications à la reconnaissance de la parole. PhD thesis, Université d’Orsay (1991)

  • Bottou, L., Bengio, Y.: Convergence properties of the K-means algorithm. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 585–592. MIT, Cambridge (1995)

    Google Scholar 

  • Bryant, P.: Large-samples results for optimization based clustering methods. J. Classif. 8, 31–44 (1991)

    Article  MATH  Google Scholar 

  • Bryant, P., Williamson, J.A.: Assymptotic behaviour of classification maximum likelihood estimates. Biometrika 65, 273–281 (1978)

    Article  MATH  Google Scholar 

  • Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14, 315–332 (1992)

    Article  MATH  Google Scholar 

  • Celeux, G., Govaert, G.: Comparison of the mixture and the classification maximum likelihood in cluster analysis. J. Stat. Comput. Simul. 47, 127–146 (1993)

    Article  Google Scholar 

  • Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognit. 28, 781–793 (1995)

    Article  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39, 1–38 (1977)

    MATH  Google Scholar 

  • Liu, Z., Almhana, J., Choulakian, V., McGorman, R.: Online EM algorithm for mixture with application to Internet traffic modeling. Comput. Stat. Data Anal. 50, 1052–1071 (2006)

    Article  Google Scholar 

  • MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematics, Statistics and Probability, vol. 1, pp. 281–298 (1967)

  • Mariott, F.H.C.: Separating mixtures of normal distributions. Biometrics 31, 767–769 (1975)

    Article  Google Scholar 

  • Scott, A.J., Symons, M.J.: Clustering methods based on likelihood ratio criteria. Biometrics 27, 387–397 (1971)

    Article  Google Scholar 

  • Titterington, D.M.: Recursive parameter estimation using incomplete data. J. Roy. Stat. Soc. Ser. B 46, 257–267 (1984)

    MATH  Google Scholar 

  • Wang, S., Zhao, Y.: Almost sure convergence of Titterington’s recursive estimator for mixture models. In: IEEE International Symposium on Information Theory, ISIT (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Allou Samé.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Samé, A., Ambroise, C. & Govaert, G. An online classification EM algorithm based on the mixture model. Stat Comput 17, 209–218 (2007). https://doi.org/10.1007/s11222-007-9017-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-007-9017-z

Keywords

Navigation