Skip to main content
Log in

ICEAGE: Interactive Clustering and Exploration of Large and High-Dimensional Geodata

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

The unprecedented large size and high dimensionality of existing geographic datasets make the complex patterns that potentially lurk in the data hard to find. Clustering is one of the most important techniques for geographic knowledge discovery. However, existing clustering methods have two severe drawbacks for this purpose. First, spatial clustering methods focus on the specific characteristics of distributions in 2- or 3-D space, while general-purpose high-dimensional clustering methods have limited power in recognizing spatial patterns that involve neighbors. Second, clustering methods in general are not geared toward allowing the human-computer interaction needed to effectively tease-out complex patterns. In the current paper, an approach is proposed to open up the “black box” of the clustering process for easy understanding, steering, focusing and interpretation, and thus to support an effective exploration of large and high dimensional geographic data. The proposed approach involves building a hierarchical spatial cluster structure within the high-dimensional feature space, and using this combined space for discovering multi-dimensional (combined spatial and non-spatial) patterns with efficient computational clustering methods and highly interactive visualization techniques. More specifically, this includes the integration of: (1) a hierarchical spatial clustering method to generate a 1-D spatial cluster ordering that preserves the hierarchical cluster structure, and (2) a density- and grid-based technique to effectively support the interactive identification of interesting subspaces and subsequent searching for clusters in each subspace. The implementation of the proposed approach is in a fully open and interactive manner supported by various visualization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. C. Aggarwal and P. Yu. “Finding generalized projected clusters in high dimensional spaces,” ACM SIGMOD International Conference on Management of Data, 2000.

  2. C.C. Aggarwal. “Re-designing distance functions and distance-based applications for high dimensional data,” SIGMOD Rec., Vol. 30:13–18, 2001.

    Google Scholar 

  3. C.C. Aggarwal, A. Hinneburg, and D.A. Keim, “On the surprising behavior of distance metrics in high dimensional space,” in Database Theory—ICDT 2001, Vol. 1973, Springer-Verlag: Berlin, 2001.

    Google Scholar 

  4. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. “Automatic subspace clustering of high dimensional data for data mining applications,” ACM SIGMOD International Conference on Management of Data, Seattle, WA, USA, 1998.

  5. M. Ankerst, M.M. Breunig, H.-P. Kriegel, and J. Sander. “OPTICS: Ordering Points To Identify the Clustering Structure,” ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, 1999.

  6. M. Ankerst, M. Ester, and H.-P. Kriegel. “Towards an effective cooperation of the user and the computer for classification,” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, Massachusetts, United States, 2000.

  7. S. Baase and A.V. Gelder. Computer Algorithms. Addison-Wesley, 2000.

  8. A. Bookstein, V.A. Kulyukin, and T. Raita. “Generalized Hamming Distance,” Information Retrieval, Vol. 5:353–375, 2002.

    Google Scholar 

  9. P. Bradley, U. Fayyad, and C. Reina. “Scaling clustering algorithms to large databases,” ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York City, 1998.

  10. C. Cheng, A. Fu, and Y. Zhang. “Entropy-based subspace clustering for mining numerical data,” ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 1999.

  11. R.O. Duda, P.E. Hart, and D.G. Stork. Pattern classification. John Wiley & Sons, New York, 2000.

    Google Scholar 

  12. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise,” The 2nd International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, 1996.

  13. V. Estivill-Castro and I. Lee. “Amoeba: Hierarchical clustering based on spatial proximity using Delaunaty diagram,” 9th International Symposium on Spatial Data Handling, Beijing, China, 2000.

  14. U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. “From data mining to knowledge discovery-An review,” in U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusay (Eds.), Advances in Knowledge Discovery, AAAI Press/The MIT Press: Cambridge, MA, 1996.

    Google Scholar 

  15. C. Fraley. “Algorithms for model-based gaussian hierarchical clustering,” SIAM Journal on Scientific Computing, Vol. 20:270–281, 1998.

    Google Scholar 

  16. M. Gahegan. “On the application of inductive machine learning tools to geographical analysis,” Geographical Analysis, Vol. 32:113–139, 2000.

    Google Scholar 

  17. A.D. Gordon. “A review of hierarchical classification,” Journal of the Royal Statistical Society. Series A (General), Vol. 150:119–137, 1987.

    Google Scholar 

  18. A.D. Gordon, “Hierarchical classification,” in P. Arabie, L.J. Hubert, and G.D. Soete (Eds.), Clustering and Classification, World Scientific Publ.: River Edge, NJ, 1996.

    Google Scholar 

  19. L. Guibas and J. Stolfi. “Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams,” ACT TOG, Vol. 4: 1985.

  20. D. Harel and Y. Koren. “Clustering spatial data using random walks,” Proceedings of the seventh conference on Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California, 2001.

  21. A. Hinneburg and D.A. Keim. “Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering,” Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999.

  22. A. Inselberg. “The plane with parallel coordinates,” The Visual Computer, Vol. 1:69–97, 1985.

    Google Scholar 

  23. A.K. Jain and R.C. Dubes, Algorithms for clustering data. Prentice Hall: Englewood Cliffs, NJ, 1988.

    Google Scholar 

  24. A.K. Jain, M.N. Murty, and P.J. Flynn. “Data clustering: A review,” ACM Computing Surveys (CSUR), Vol. 31:264–323, 1999.

    Google Scholar 

  25. I.-S. Kang, T.-W. Kim, and K.-J. Li. “A spatial data mining method by Delaunay triangulation,” The 5th international workshop on Advances in geographic information systems, LasVegas, Nevada, 1997.

  26. H.J. Miller and J. Han. “Geographic data mining and knowledge discovery: an overview,” in H.J. Miller and J. Han (Eds.), Geographic Data Mining and Knowledge Discovery, Taylor & Francis: London and New York, 2001.

    Google Scholar 

  27. R. Ng and J. Han. “Efficient and effective clustering methods for spatial data mining,” Proc. 20th International Conference on Very Large Databases, Santiago, Chile, 1994.

  28. S. Openshaw. “Developing appropriate spatial analysis methods for GIS,” in D.J. Maguire (Ed.), Geographical Information Systems, Vol. 1: Principles, Longman/Wiley, 1991.

  29. S. Openshaw, M. Charlton, C. Wymer, and A. Craft. “A Mark 1 geographical analysis machine for the automated analysis of point data sets,” International Journal of Geographical Information Science, Vol. 1:335–358, 1987.

    Google Scholar 

  30. D.J. Peuquet. Representations of Space and Time. New York: Guilford Press, 2002.

    Google Scholar 

  31. C.M. Procopiuc, M. Jones, P.K. Agarwal, and T.M. Murali. “A Monte Carlo algorithm for fast projective clustering,” ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, 2002.

  32. E. Schikuta. “Grid clustering: An efficient hierarchical clustering method for very large data sets,” 13th Conf. on Pattern Recognition, Vol. 2, 1996.

  33. T.A. Slocum. Thematic Cartography and Visualization. Upper Saddle River, N.J.: Prentice Hall, 1999.

    Google Scholar 

  34. A.K.H. Tung, J. Hou, and J. Han. “Spatial clustering in the presence of obstacles,” The 17th International Conference on Data Engineering (ICDE'01), 2001.

  35. S. Vaithyanathan and B. Dom. “Model-based hierarchical clustering,” The Sixteenth Conference on Uncertainty in Artificial Intelligence, Stanford, CA, 2000.

  36. D. Vandev and Y.G. Tsvetanova. “Perfect chains and single linkage clustering algorithm,” Statistical Data Analysis, Proceedings SDA-95, 1995.

  37. W. Wang, J. Yang, and R. Muntz. “STING: A statistical information grid approach to spatial data mining,” 23rd Int. Conf on Very Large Data Bases, Athens, Greece, 1997.

  38. C. Zhang and Y. Murayama. “Testing local spatial autocorrelation using k-order neighbors,” International Journal of Geographical Information Science, Vol. 14:681–692, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, D., Peuquet, D.J. & Gahegan, M. ICEAGE: Interactive Clustering and Exploration of Large and High-Dimensional Geodata. GeoInformatica 7, 229–253 (2003). https://doi.org/10.1023/A:1025101015202

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025101015202

Navigation