Skip to main content
Log in

Clustering spatial data with a hybrid EM approach

  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In spatial clustering, in addition to the object similarity in the normal attribute space, similarity in the spatial space needs to be considered and objects assigned to the same cluster should usually be close to one another in the spatial space. The conventional expectation maximization (EM) algorithm is not suited for spatial clustering because it does not consider spatial information. Although neighborhood EM (NEM) algorithm incorporates a spatial penalty term to the criterion function, it involves much more iterations in every E-step. In this paper, we propose a Hybrid EM (HEM) approach that combines EM and NEM. Its computational complexity for every pass is between EM and NEM. Experiments also show that its clustering quality is better than EM and comparable to NEM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Ambroise C, Govaert G (1998) Convergence of an EM-type algorithm for spatial clustering. Pattern Recognit Lett 19(10):919–927

    Google Scholar 

  2. Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68

  3. Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332

    Google Scholar 

  4. Cressie NA (1993) Statistics for spatial data. Wiley, New York

    Google Scholar 

  5. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B(39):1–38

    Google Scholar 

  6. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining, pp 226–231

  7. Estivill-Castro V, Lee I (2001) Fast spatial clustering with different metrics and in the presence of obstacles. In: Proceedings of the 9th ACM international symposium on aAdvances in geographic information systems, pp 142 – 147

  8. Friedman N (1998) The Bayesian structural EM algorithm. In: Proceedings of the 14th conference on uncertainty in artificial intelligence, pp 129–138

  9. Geman S, Geman G (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Machine Intell 6:721–741

    Google Scholar 

  10. Gilley OW, Pace RK (1996) On the harrison and rubinfeld data. J Environ Econ Management 31:403–405

    Google Scholar 

  11. Guo D, Peuquet D, Gahegan M (2002) Opening the black box: Interactive hierarchical clustering for multivariate spatial patterns. In: Proceedings of the 10th ACM international symposium on advances in geographic information systems, pp 131 – 136

  12. Hathaway RJ (1986) Another interpretation of the EM algorithm for mixture distributions. Stat Probabil Lett 4:53–56

    Google Scholar 

  13. Jain AK, Farrokhnia F (1991) Unsupervised texture segmentation using Gabor filters. Pattern Recognit 24(12):1167–1186

    Google Scholar 

  14. Karypis G, Han EH, Kumar V (1999) CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. Computer 32(8):68–75

    Google Scholar 

  15. Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    Google Scholar 

  16. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86

    Google Scholar 

  17. Legendre P (1987) Constrained clustering. In: Legendre P, Legendre L (eds) Developments in numerical ecology, pp 289–307. NATO ASI Series G 14

  18. LeSage JP (1999) MATLAB Toolbox for spatial econometrics. http://www.spatial-econometrics.com

  19. Mceliece R (1977) Theory of information and coding. Addison-Wesley, Reading

    Google Scholar 

  20. Murphy PM, Aha DW (1994) UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California at Irvine, http://www.ics.uci.edu/~ mlearn/MLRepository.html

  21. Neal R, Hinton G (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan M (ed) Learning in graphical models. Kluwer, Dordrecht, pp 355–368

    Google Scholar 

  22. Neukirchen C, Rottland J, Willett D, Rigoll G (2001) A continuous density interpretation of discrete HMM systems and MMI-neural networks. IEEE Trans Speech Audio Processing 9(4):367–377

    Google Scholar 

  23. Ng R, Han J (2002) CLARANS: A method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016

    Google Scholar 

  24. Pena JM, Lozano JA, Larranaga P (2000) An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering. Pattern Recognit Lett 21(8):779–786

    Google Scholar 

  25. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  26. Rasson JP, Granville V (1995) Multivariate discriminant analysis and maximum penalized likelihood density estimation. J R Stat Soc B(57):501–517

    Google Scholar 

  27. Tung A, Hou J, Han J (2001) Spatial clustering in the presence of obstacles. In: Proceedings of 17th international conference on data engineering, pp 359–367

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, T., Sung, S.Y. Clustering spatial data with a hybrid EM approach. Pattern Anal Applic 8, 139–148 (2005). https://doi.org/10.1007/s10044-005-0251-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-005-0251-8

Keywords

Navigation