Abstract
Clustering is an important concept for analysis of data in GIS. Due to the potentially large amount of data in such systems, the time complexity for clustering algorithms is critical. K-means is a popular clustering algorithm for large-scale systems because of its linear complexity. However, this requires a priori knowledge of the number of clusters and the subsequent selection of their centroids. We propose a method for K-means to find automatically the number of clusters and their associated centroids. Moreover, we consider recursive extension of the algorithm to improve visibility of the results at different levels of abstraction, in order to support the decision-making process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bacao F, Lobo V, Painho M (2005) Self-organizing maps as substitutes for K- means clustering. In: Sunderam VS et al. (eds): ICCS 2005, LNCS 3516, pp 476–483
Galjano P, Popovich V (2007) Intelligent images analysis in GIS. In: Popovich VV et al. (eds) Information fusion and geographic information systems. Proceedings of the third international workshop, LNG&C, pp 45–68
Valkovsky VB, Gerasimov MB (1995) Approximate recursive solution for large scale traveling salesman problem (in Russian). Proceedings of St. Petersburg Electrotechnical University, No 489, St Petersburg, pp 27–37
Valkovsky VB, Gerasimov MB, Savvin KO (1999) Phase transitions inTSP and matrix topology. In: Proceedings of the joint workshop on integration of AI and OR techniques in constraint programming for combinatorial optimization problems. Universita degli studi di Ferrara- Facolta di Ingegneria, Italy
Karlsson M (2009) Modifying K-means clustering for Data Mining. Master thesis, Uppsala University
Murray AT, Estivil-Castro V (1998) Cluster discovery techniques for exploratory spatial data analysis. In: International journal of geographical information science, 12, Issue 5, July, pp 431–443
Pick J (2004) Geographic information systems. Proceedings of American conference on information systems, AMCIS 2004
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM computing surveys 31(3): 264–323
Kolatch E (2001) Clustering algorithms for spatial databases: a survey, http://citeseer.ij.nec.com/436843.html
Rui X, Wunsch DC II (2009) Clustering. IEEE Press series on computational intelligence, John Wiley & Sons
Forgy E (1965) Cluster analysis of multivariate data; efficiency vs. interpretability of classifications. Biometrics, 21: pp 768–780
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium, 1, pp 281–297
Duda R, Hart P (2001) Pattern classification, 2nd edn. New York, NY: John Wiley & Sons
Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. San Diego, CA: Academic Press
Tan PN, Steinbach M, Kumar V (2006) Introduction to Data Mining. Addison Wesley
Bradley P, Fayyad U (1998) Refining initial points for K-means clustering. International conference on machine learning (ICML-98), pp 91–99
Selim S, Ismail M (1984) K-means-type algorithms: a generalization convergence theorem and characterization of local optimality. IEEE Transactions on pattern analysis and machine intelligence, 6(1): pp 77–81
Dubes R (1993) Cluster analysis and related issue. In: Chen C, Pau L, Wang P (eds) Handbook of pattern recognition and computer vision, River Edge, NY: World Science Publishing Company, pp 3–32
Krishna K, Murty M (1999) Generic K-Means algorithm. IEEE Transactions on systems, man, and cybernetics- part B: Cybernetics, 29(3): pp 433–439
Jai A, Dubes R (1988) Algorithms for clustering data. Englewood Cliffs, NJ: Prentice Hall
Likas A, Vlassis N, Verbeek J (2003) The global K-means clustering algorithm. Pattern recognition, 36(2), pp 451–461
Pena JM, Lozano JA, Larranaga P (1999) An empirical comparison of four initialization methods for K-means algorithm. Pattern recognition letters 20: pp 1027–1040
Ball G, Hall D (1967) A clustering technique for summarizing multivariate data. Behavioral science, 12: pp 153–155
Milligan G, Cooper M (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50: pp 150–179
SAS Institute Inc., SAS technical report A-108 (1983) Cubic clustering criterion. Cary, NC: SAS Institute Inc., 56 pp
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans. inform theory 13(1): 21–27
Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Machine learning 2: 139–172
Higgs RE, Bemis KG, Watson I, Wikel J (1997) Experimental designs for selecting molecules from large chemical databases. Journal of chemical information and computer sciences (37) 5: 861–870
Meila M, Heckerman D (2001) An experimental comparison of several clustering and initialization methods. Machine learning 42: 9–29
Han J, Kamber M (2006) Data Mining. Concepts and techniques. Elsevier Inc.
Wasserman L (2007) All of nonparametric statistics. Springer-Verlag
Kolmogorov A (1941) Confidence limits for an unknown distribution function. Annals of mathematical statistics 12, 461–483
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hamfelt, A., Karlsson, M., Thierfelder, T., Valkovsky, V. (2011). Beyond K-means: Clusters Identification for GIS. In: Popovich, V., Claramunt, C., Devogele, T., Schrenk, M., Korolenko, K. (eds) Information Fusion and Geographic Information Systems. Lecture Notes in Geoinformation and Cartography(), vol 5. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19766-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-19766-6_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19765-9
Online ISBN: 978-3-642-19766-6
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)