Skip to main content

Beyond K-means: Clusters Identification for GIS

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Geoinformation and Cartography ((LNGC,volume 5))

Abstract

Clustering is an important concept for analysis of data in GIS. Due to the potentially large amount of data in such systems, the time complexity for clustering algorithms is critical. K-means is a popular clustering algorithm for large-scale systems because of its linear complexity. However, this requires a priori knowledge of the number of clusters and the subsequent selection of their centroids. We propose a method for K-means to find automatically the number of clusters and their associated centroids. Moreover, we consider recursive extension of the algorithm to improve visibility of the results at different levels of abstraction, in order to support the decision-making process.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Bacao F, Lobo V, Painho M (2005) Self-organizing maps as substitutes for K- means clustering. In: Sunderam VS et al. (eds): ICCS 2005, LNCS 3516, pp 476–483

    Google Scholar 

  • Galjano P, Popovich V (2007) Intelligent images analysis in GIS. In: Popovich VV et al. (eds) Information fusion and geographic information systems. Proceedings of the third international workshop, LNG&C, pp 45–68

    Chapter  Google Scholar 

  • Valkovsky VB, Gerasimov MB (1995) Approximate recursive solution for large scale traveling salesman problem (in Russian). Proceedings of St. Petersburg Electrotechnical University, No 489, St Petersburg, pp 27–37

    Google Scholar 

  • Valkovsky VB, Gerasimov MB, Savvin KO (1999) Phase transitions inTSP and matrix topology. In: Proceedings of the joint workshop on integration of AI and OR techniques in constraint programming for combinatorial optimization problems. Universita degli studi di Ferrara- Facolta di Ingegneria, Italy

    Google Scholar 

  • Karlsson M (2009) Modifying K-means clustering for Data Mining. Master thesis, Uppsala University

    Google Scholar 

  • Murray AT, Estivil-Castro V (1998) Cluster discovery techniques for exploratory spatial data analysis. In: International journal of geographical information science, 12, Issue 5, July, pp 431–443

    Google Scholar 

  • Pick J (2004) Geographic information systems. Proceedings of American conference on information systems, AMCIS 2004

    Book  Google Scholar 

  • Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM computing surveys 31(3): 264–323

    Article  Google Scholar 

  • Kolatch E (2001) Clustering algorithms for spatial databases: a survey, http://citeseer.ij.nec.com/436843.html

  • Rui X, Wunsch DC II (2009) Clustering. IEEE Press series on computational intelligence, John Wiley & Sons

    Google Scholar 

  • Forgy E (1965) Cluster analysis of multivariate data; efficiency vs. interpretability of classifications. Biometrics, 21: pp 768–780

    Google Scholar 

  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium, 1, pp 281–297

    Google Scholar 

  • Duda R, Hart P (2001) Pattern classification, 2nd edn. New York, NY: John Wiley & Sons

    Google Scholar 

  • Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. San Diego, CA: Academic Press

    Google Scholar 

  • Tan PN, Steinbach M, Kumar V (2006) Introduction to Data Mining. Addison Wesley

    Google Scholar 

  • Bradley P, Fayyad U (1998) Refining initial points for K-means clustering. International conference on machine learning (ICML-98), pp 91–99

    Google Scholar 

  • Selim S, Ismail M (1984) K-means-type algorithms: a generalization convergence theorem and characterization of local optimality. IEEE Transactions on pattern analysis and machine intelligence, 6(1): pp 77–81

    Article  Google Scholar 

  • Dubes R (1993) Cluster analysis and related issue. In: Chen C, Pau L, Wang P (eds) Handbook of pattern recognition and computer vision, River Edge, NY: World Science Publishing Company, pp 3–32

    Google Scholar 

  • Krishna K, Murty M (1999) Generic K-Means algorithm. IEEE Transactions on systems, man, and cybernetics- part B: Cybernetics, 29(3): pp 433–439

    Google Scholar 

  • Jai A, Dubes R (1988) Algorithms for clustering data. Englewood Cliffs, NJ: Prentice Hall

    Google Scholar 

  • Likas A, Vlassis N, Verbeek J (2003) The global K-means clustering algorithm. Pattern recognition, 36(2), pp 451–461

    Article  Google Scholar 

  • Pena JM, Lozano JA, Larranaga P (1999) An empirical comparison of four initialization methods for K-means algorithm. Pattern recognition letters 20: pp 1027–1040

    Article  Google Scholar 

  • Ball G, Hall D (1967) A clustering technique for summarizing multivariate data. Behavioral science, 12: pp 153–155

    Article  Google Scholar 

  • Milligan G, Cooper M (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50: pp 150–179

    Google Scholar 

  • SAS Institute Inc., SAS technical report A-108 (1983) Cubic clustering criterion. Cary, NC: SAS Institute Inc., 56 pp

    Google Scholar 

  • Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans. inform theory 13(1): 21–27

    Article  Google Scholar 

  • Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Machine learning 2: 139–172

    Google Scholar 

  • Higgs RE, Bemis KG, Watson I, Wikel J (1997) Experimental designs for selecting molecules from large chemical databases. Journal of chemical information and computer sciences (37) 5: 861–870

    Google Scholar 

  • Meila M, Heckerman D (2001) An experimental comparison of several clustering and initialization methods. Machine learning 42: 9–29

    Article  Google Scholar 

  • Han J, Kamber M (2006) Data Mining. Concepts and techniques. Elsevier Inc.

    Google Scholar 

  • Wasserman L (2007) All of nonparametric statistics. Springer-Verlag

    Google Scholar 

  • Kolmogorov A (1941) Confidence limits for an unknown distribution function. Annals of mathematical statistics 12, 461–483

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Hamfelt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hamfelt, A., Karlsson, M., Thierfelder, T., Valkovsky, V. (2011). Beyond K-means: Clusters Identification for GIS. In: Popovich, V., Claramunt, C., Devogele, T., Schrenk, M., Korolenko, K. (eds) Information Fusion and Geographic Information Systems. Lecture Notes in Geoinformation and Cartography(), vol 5. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19766-6_8

Download citation

Publish with us

Policies and ethics