Beyond K-means: Clusters Identification for GIS

Hamfelt, Andreas; Karlsson, Mikael; Thierfelder, Tomas; Valkovsky, Vladislav

doi:10.1007/978-3-642-19766-6_8

Andreas Hamfelt⁶,
Mikael Karlsson⁷,
Tomas Thierfelder⁸ &
…
Vladislav Valkovsky⁶

Part of the book series: Lecture Notes in Geoinformation and Cartography ((LNGC,volume 5))

987 Accesses
4 Citations

Abstract

Clustering is an important concept for analysis of data in GIS. Due to the potentially large amount of data in such systems, the time complexity for clustering algorithms is critical. K-means is a popular clustering algorithm for large-scale systems because of its linear complexity. However, this requires a priori knowledge of the number of clusters and the subsequent selection of their centroids. We propose a method for K-means to find automatically the number of clusters and their associated centroids. Moreover, we consider recursive extension of the algorithm to improve visibility of the results at different levels of abstraction, in order to support the decision-making process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

K-Means Algorithm to Form Dynamic Cluster Formation to Counter the Static Property of K-Means

cs-means: Determining optimal number of clusters based on a level-of-similarity

Article 06 October 2020

Non-hierarchical Clustering for Large Data Without Recalculating Cluster Center

References

Bacao F, Lobo V, Painho M (2005) Self-organizing maps as substitutes for K- means clustering. In: Sunderam VS et al. (eds): ICCS 2005, LNCS 3516, pp 476–483
Google Scholar
Galjano P, Popovich V (2007) Intelligent images analysis in GIS. In: Popovich VV et al. (eds) Information fusion and geographic information systems. Proceedings of the third international workshop, LNG&C, pp 45–68
Chapter Google Scholar
Valkovsky VB, Gerasimov MB (1995) Approximate recursive solution for large scale traveling salesman problem (in Russian). Proceedings of St. Petersburg Electrotechnical University, No 489, St Petersburg, pp 27–37
Google Scholar
Valkovsky VB, Gerasimov MB, Savvin KO (1999) Phase transitions inTSP and matrix topology. In: Proceedings of the joint workshop on integration of AI and OR techniques in constraint programming for combinatorial optimization problems. Universita degli studi di Ferrara- Facolta di Ingegneria, Italy
Google Scholar
Karlsson M (2009) Modifying K-means clustering for Data Mining. Master thesis, Uppsala University
Google Scholar
Murray AT, Estivil-Castro V (1998) Cluster discovery techniques for exploratory spatial data analysis. In: International journal of geographical information science, 12, Issue 5, July, pp 431–443
Google Scholar
Pick J (2004) Geographic information systems. Proceedings of American conference on information systems, AMCIS 2004
Book Google Scholar
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM computing surveys 31(3): 264–323
Article Google Scholar
Kolatch E (2001) Clustering algorithms for spatial databases: a survey, http://citeseer.ij.nec.com/436843.html
Rui X, Wunsch DC II (2009) Clustering. IEEE Press series on computational intelligence, John Wiley & Sons
Google Scholar
Forgy E (1965) Cluster analysis of multivariate data; efficiency vs. interpretability of classifications. Biometrics, 21: pp 768–780
Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium, 1, pp 281–297
Google Scholar
Duda R, Hart P (2001) Pattern classification, 2nd edn. New York, NY: John Wiley & Sons
Google Scholar
Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3^rd edn. San Diego, CA: Academic Press
Google Scholar
Tan PN, Steinbach M, Kumar V (2006) Introduction to Data Mining. Addison Wesley
Google Scholar
Bradley P, Fayyad U (1998) Refining initial points for K-means clustering. International conference on machine learning (ICML-98), pp 91–99
Google Scholar
Selim S, Ismail M (1984) K-means-type algorithms: a generalization convergence theorem and characterization of local optimality. IEEE Transactions on pattern analysis and machine intelligence, 6(1): pp 77–81
Article Google Scholar
Dubes R (1993) Cluster analysis and related issue. In: Chen C, Pau L, Wang P (eds) Handbook of pattern recognition and computer vision, River Edge, NY: World Science Publishing Company, pp 3–32
Google Scholar
Krishna K, Murty M (1999) Generic K-Means algorithm. IEEE Transactions on systems, man, and cybernetics- part B: Cybernetics, 29(3): pp 433–439
Google Scholar
Jai A, Dubes R (1988) Algorithms for clustering data. Englewood Cliffs, NJ: Prentice Hall
Google Scholar
Likas A, Vlassis N, Verbeek J (2003) The global K-means clustering algorithm. Pattern recognition, 36(2), pp 451–461
Article Google Scholar
Pena JM, Lozano JA, Larranaga P (1999) An empirical comparison of four initialization methods for K-means algorithm. Pattern recognition letters 20: pp 1027–1040
Article Google Scholar
Ball G, Hall D (1967) A clustering technique for summarizing multivariate data. Behavioral science, 12: pp 153–155
Article Google Scholar
Milligan G, Cooper M (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50: pp 150–179
Google Scholar
SAS Institute Inc., SAS technical report A-108 (1983) Cubic clustering criterion. Cary, NC: SAS Institute Inc., 56 pp
Google Scholar
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans. inform theory 13(1): 21–27
Article Google Scholar
Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Machine learning 2: 139–172
Google Scholar
Higgs RE, Bemis KG, Watson I, Wikel J (1997) Experimental designs for selecting molecules from large chemical databases. Journal of chemical information and computer sciences (37) 5: 861–870
Google Scholar
Meila M, Heckerman D (2001) An experimental comparison of several clustering and initialization methods. Machine learning 42: 9–29
Article Google Scholar
Han J, Kamber M (2006) Data Mining. Concepts and techniques. Elsevier Inc.
Google Scholar
Wasserman L (2007) All of nonparametric statistics. Springer-Verlag
Google Scholar
Kolmogorov A (1941) Confidence limits for an unknown distribution function. Annals of mathematical statistics 12, 461–483
Article Google Scholar

Download references

Author information

Authors and Affiliations

Informatics and Media, Uppsala University, 513, 75120, Uppsala, Sweden
Andreas Hamfelt & Vladislav Valkovsky
Eins SAP Consulting, Bellmansgatan 2, 11820, Stockholm, Sweden
Mikael Karlsson
Department of Energy and Technology, Swedish University of Agricultural Sciences, 7032, 75007, Uppsala, Sweden
Tomas Thierfelder

Authors

Andreas Hamfelt
View author publications
You can also search for this author in PubMed Google Scholar
Mikael Karlsson
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Thierfelder
View author publications
You can also search for this author in PubMed Google Scholar
Vladislav Valkovsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Hamfelt .

Editor information

Editors and Affiliations

SPIIRAS 39, 14th Line, V.O., St. Petersburg, 199178, Russian Federation
Vasily V. Popovich
Naval Academy Research Institute, Brest naval, 29240, France
Christophe Claramunt
Naval Academy Research Institute, Brest naval, 29240, France
Thomas Devogele
Inst. Stadt, Verkehr, Umwelt und, Informationsgesellschaft, CEIT ALANOVA gemeinnützige GmbH, Am Concorde Park 2, Schwechat, 2320, Austria
Manfred Schrenk
B1320, NAVSEA, Chief Scientist/NUWC Code 1543, Howell St. 1176, Newport, 02841-1708, USA
Kyrill Korolenko

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hamfelt, A., Karlsson, M., Thierfelder, T., Valkovsky, V. (2011). Beyond K-means: Clusters Identification for GIS. In: Popovich, V., Claramunt, C., Devogele, T., Schrenk, M., Korolenko, K. (eds) Information Fusion and Geographic Information Systems. Lecture Notes in Geoinformation and Cartography(), vol 5. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19766-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-19766-6_8
Published: 05 May 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19765-9
Online ISBN: 978-3-642-19766-6
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics