Abstract
In this paper, a distributed expectation maximization (DEM) algorithm is first introduced in a general form for estimating the parameters of a finite mixture of components. This algorithm is used for density estimation and clustering of data distributed over nodes of a network. Then, a distributed incremental EM algorithm (DIEM) with a higher convergence rate is proposed. After a full derivation of distributed EM algorithms, convergence of these algorithms is analyzed based on the negative free energy concept used in statistical physics. An analytical approach is also developed for evaluating the convergence rate of both incremental and distributed incremental EM algorithms. It is analytically shown that the convergence rate of DIEM is much faster than that of the DEM algorithm. Finally, simulation results approve that DIEM remarkably outperforms DEM for both synthetic and real data sets.
Similar content being viewed by others
References
Assent I, Krieger R, Glavic B (2008) Clustering multidimensional sequences in spatial and temporal databases. Knowl Inf Syst 16: 29–51
Besag J (1975) Statistical analysis of non-lattice data. Statistician 24(3): 79–195
Besag J (1986) On the statistical analysis of dirty pictures. J R Stat Soc B (Methodological) 48(3): 259–302
Brecheisen S, Kriegel HP, Pfeifle M (2006) Multi-step density-based clustering. Knowl Inf Syst 9(3): 284–308
Chen R, Sivakumar K, Kargupta H (2004) Collective mining of Bayesian networks from distributed heterogeneous data. Knowl Inf Syst 6: 164–187
Dasgupta S (1999) Learning mixtures of Gaussians. In: Proceedings of the 40th annual symposium on foundations of computer science. IEEE Computer Society, New York, 17–19 October, pp 634–644
Datta S, Bhaduri K, Giannella C et al (2006) Distributed data mining in peer-to-peer networks. IEEE Internet Comput 10: 18–26
Dempster A, Laird N, Rubin D (1977) Maximum likelihood estimation from incomplete data via the em algorithm. J R Stat Soc Ser B 39: 1–38
Dutta S, Gianella C, Kargupta H (2005) K-means clustering over peer-to-peer networks. In: 8th international workshop on high performance and distributed mining, SIAM international conference on data mining
Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3): 381–396
Gabriela M, Sander J, Ester M (2008) Robust projected clustering. Knowl Inf Syst 14: 273–298
Ghosh D, Chinnaiyan AM (2002) Mixture modeling of gene expression data from microarray experiments. Bioinformatics 18: 275–286
Giannella C, Dutta H, Mukherjee S et al (2006) Efficient kernel density estimation over distributed data. In: 9th international workshop on high performance and distributed mining, SIAM international conference on data mining
Gondek D, Hofmann T (2007) Non-redundant data clustering. Knowl Inf Syst 12: 1–24
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2/3): 107–145
Hinnerburge D, Keim DA (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5: 387–415
Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11): 1370–1386
Kargupta H, Kamath C, Chan P (2000) Distributed and parallel data mining: emergence, growth, and future directions. Advances in distributed and parallel knowledge discovery, AAAI/MIT Press, Cambridge, pp 409–416
Kowalczyk W, Vlassis N (2005) Newscast EM. Advances in neural information processing systems, vol 17. MIT Press, Cambridge
Lin X, Clifton C, Zhu M (2005) Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8: 68–81
Ma J, Xu L, Jordan MI (2000) Asymptotic convergence rate of the EM algorithm for Gaussian mixtures. Neural Comput 12: 2881–2907
McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18: 413–422
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York, pp 120–211
Neal R, Hinton G (1999) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI(eds) Learning in graphical models. MIT Press, Cambridge, pp 355–368
Nowak RD (2003) Distributed EM algorithms for density estimation and clustering in sensor networks. IEEE Trans Signal Process 51: 2245–2253
Ordonez C, Omiecinski E (2002) FREM: fast and robust EM clustering for large data sets. In: Proceedings of the ACM CIKM conference, pp 590–599
Ordonez C, Omiecinski E (2005) Accelerating EM clustering to find high-quality solutions. Knowl Inf Syst 7: 135–157
Roweis S, Ghahramani Z (1999) A unifying review of linear Gaussian models. Neural Comput 11: 305–345
Thiesson B, Meek C, Heckerman D (2001) Accelerating EM for large databases. Mach Learn 45: 279–299
Verbeek JJ, Vlassis N, Nunnink JRJ (2003) A variational EM approach for large-scale mixture modeling. In: Proceedings of 8th annual conference of the advanced school of computing and imaging. Heijen, The Netherlands
Vincent C, Wüthrich B (2002) Distributed mining of classification rules. Knowl Inf Syst 4: 1–30
Wolff R, Schuster A (2004) Association rule mining in peer-to-peer systems. IEEE Trans Syst Man Cybern B 34: 2426–2438
Wu X, Kumar V, Quinlan J et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14: 1–37
Xia Y, Zhang C, Weng S et al (2005) Fault-tolerant EM algorithm for GMM in sensor networks. In: Proceedings of the 2005 international conference on data mining, Las Vegas, Nevada, USA, pp 166–172
Xu L, Jordan MI (1996) On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput 8: 129–151
Yeung KY, Fraley C, Murua A et al (2001) Model-based clustering and data transformation for gene expression data. Bioinformatics 17: 977–987
Yuille A, Stolorz P, Utans J (1994) Mixtures of distributions and the EM algorithm. Neural Comput 6(1): 334–340
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Safarinejadian, B., Menhaj, M.B. & Karrari, M. A distributed EM algorithm to estimate the parameters of a finite mixture of components. Knowl Inf Syst 23, 267–292 (2010). https://doi.org/10.1007/s10115-009-0218-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0218-y