Abstract
In spatial clustering, in addition to the object similarity in the normal attribute space, similarity in the spatial space needs to be considered and objects assigned to the same cluster should usually be close to one another in the spatial space. The conventional expectation maximization (EM) algorithm is not suited for spatial clustering because it does not consider spatial information. Although neighborhood EM (NEM) algorithm incorporates a spatial penalty term to the criterion function, it involves much more iterations in every E-step. In this paper, we propose a Hybrid EM (HEM) approach that combines EM and NEM. Its computational complexity for every pass is between EM and NEM. Experiments also show that its clustering quality is better than EM and comparable to NEM.
Similar content being viewed by others
References
Ambroise C, Govaert G (1998) Convergence of an EM-type algorithm for spatial clustering. Pattern Recognit Lett 19(10):919–927
Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68
Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332
Cressie NA (1993) Statistics for spatial data. Wiley, New York
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B(39):1–38
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining, pp 226–231
Estivill-Castro V, Lee I (2001) Fast spatial clustering with different metrics and in the presence of obstacles. In: Proceedings of the 9th ACM international symposium on aAdvances in geographic information systems, pp 142 – 147
Friedman N (1998) The Bayesian structural EM algorithm. In: Proceedings of the 14th conference on uncertainty in artificial intelligence, pp 129–138
Geman S, Geman G (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Machine Intell 6:721–741
Gilley OW, Pace RK (1996) On the harrison and rubinfeld data. J Environ Econ Management 31:403–405
Guo D, Peuquet D, Gahegan M (2002) Opening the black box: Interactive hierarchical clustering for multivariate spatial patterns. In: Proceedings of the 10th ACM international symposium on advances in geographic information systems, pp 131 – 136
Hathaway RJ (1986) Another interpretation of the EM algorithm for mixture distributions. Stat Probabil Lett 4:53–56
Jain AK, Farrokhnia F (1991) Unsupervised texture segmentation using Gabor filters. Pattern Recognit 24(12):1167–1186
Karypis G, Han EH, Kumar V (1999) CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. Computer 32(8):68–75
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Legendre P (1987) Constrained clustering. In: Legendre P, Legendre L (eds) Developments in numerical ecology, pp 289–307. NATO ASI Series G 14
LeSage JP (1999) MATLAB Toolbox for spatial econometrics. http://www.spatial-econometrics.com
Mceliece R (1977) Theory of information and coding. Addison-Wesley, Reading
Murphy PM, Aha DW (1994) UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California at Irvine, http://www.ics.uci.edu/~ mlearn/MLRepository.html
Neal R, Hinton G (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan M (ed) Learning in graphical models. Kluwer, Dordrecht, pp 355–368
Neukirchen C, Rottland J, Willett D, Rigoll G (2001) A continuous density interpretation of discrete HMM systems and MMI-neural networks. IEEE Trans Speech Audio Processing 9(4):367–377
Ng R, Han J (2002) CLARANS: A method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016
Pena JM, Lozano JA, Larranaga P (2000) An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering. Pattern Recognit Lett 21(8):779–786
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Rasson JP, Granville V (1995) Multivariate discriminant analysis and maximum penalized likelihood density estimation. J R Stat Soc B(57):501–517
Tung A, Hou J, Han J (2001) Spatial clustering in the presence of obstacles. In: Proceedings of 17th international conference on data engineering, pp 359–367
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hu, T., Sung, S.Y. Clustering spatial data with a hybrid EM approach. Pattern Anal Applic 8, 139–148 (2005). https://doi.org/10.1007/s10044-005-0251-8
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-005-0251-8