Abstract
Geographic information (e.g., locations, networks, and nearest neighbors) are unique and different from other aspatial attributes (e.g., population, sales, or income). It is a challenging problem in spatial data mining and visualization to take into account both the geographic information and multiple aspatial variables in the detection of patterns. To tackle this problem, we present and evaluate a variety of spatial ordering methods that can transform spatial relations into a one-dimensional ordering and encoding which preserves spatial locality as much possible. The ordering can then be used to spatially sort temporal or multivariate data series and thus help reveal patterns across different spaces. The encoding, as a materialization of spatial clusters and neighboring relations, is also amenable for processing together with aspatial variables by any existing (non-spatial) data mining methods. We design a set of measures to evaluate nine different ordering/encoding methods, including two space-filling curves, six hierarchical clustering based methods, and a one-dimensional Sammon mapping (a multidimensional scaling approach). Evaluation results with various data distributions show that the optimal ordering/encoding with the complete-linkage clustering consistently gives the best overall performance, surpassing well-known space-filling curves in preserving spatial locality. Moreover, clustering-based methods can encode not only simple geographic locations, e.g., x and y coordinates, but also a wide range of other spatial relations, e.g., network distances or arbitrarily weighted graphs.
Similar content being viewed by others
References
Andrienko, G., & Andrienko, N. (1999). Interactive maps for visual data exploration. International Journal of Geographical Information System, 13(4), 355–374.
Andrienko, N., Andrienko, G., & Gatalsky, P. (2003). Exploratory spatio-temporal visualization: An analytical review. Journal of Visual Languages & Computing, 14(6), 503–541.
Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM SIGMOD international conference on management of data (pp. 49–60). Philadelphia, PA, USA: ACM.
Baase, S., & Gelder, A. V. (2000). Computer algorithms. Addison-Wesley.
Bar-Joseph, Z., Demaine, E. D., Gifford, D. K., Hamel, A. M., Jaakkola, T. S., & Srebro, N. (2003). K-ary clustering with optimal leaf ordering for gene expression data. Bioinformatics, 19(9), 1070–1078.
Bar-Joseph, Z., Gifford, D. K., & Jaakkola, T. S. (2001). Fast optimal leaf ordering for hierarchical clustering. Bioinformatics, 17(Suppl. 1), S22–S29.
Breinholt, G., & Schierz, C. (1998). Algorithm 781: Generating Hilbert’s space-filling curve by recursion. ACM Transactions on Mathematical Software, 24(2), 184–189.
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification. New York: Wiley.
Ertoz, L., Steinbach, M., & Kumar, V. (2003). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. The third SIAM International Conference on Data Mining (SDM ’03). San Francisco, California, USA.
Ester, M., Kriegel, H. P., & Sander, J. (1997). Spatial data mining: A database approach. Advances in spatial databases. Berlin 33, Springer Berlin Heidelberg New York. 1262, 47–66.
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. The second international conference on knowledge discovery and data mining (pp. 226–231). Portland, Oregon, USA: AAAI.
Fredrikson, A., North, C., Plaisant, C., & Shneiderman, B. (1999). Temporal, geographical and categorical aggregations viewed through coordinated displays: A case study with highway incident data. Workshop on New Paradigms in Information Visualization and Manipulation (in conjunction with ACM CIKM’99), Kansas City, Missouri, November 6, ACM New York, pp. 26–34.
Friendly, M., & Kwan, E. (2003). Effect ordering for data displays. Computational Statistics & Data Analysis, 43(4), 509–539.
Gahegan, M. (2000). The case for inductive and visual techniques in the analysis of spatial data. Journal of Geographical Systems, 2(1), 77–83.
Goodchild, M. F., & Grandfield, A. W. (1983). Optimizing raster storage: An examination of four alternatives. Proceedings, Auto-Carto, 6, 400–407.
Gordon, A. D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society. Series A (General), 150(2), 119–137.
Gordon, A. D. (1996). Hierarchical classification. In P. Arabie, L. J. Hubert, & G. D. Soete (Eds.), Clustering and classification (pp. 65–122). River Edge, New Jersey, USA: World Scientific.
Gotsman, C., & Lindenbaum, M. (1996). On the metric properties of discrete space-filling curves. IEEE Transactions on Image Processing, 5(5), 794–797.
Guo, D., Gahegan, M., MacEachren, A. M., & Zhou, B. (2005). Multivariate analysis and geovisualization with an integrated geographic knowledge discovery approach. Cartography and Geographic Information Science, 32(2), 113–132.
Guo, D., Peuquet, D., & Gahegan, M. (2003). ICEAGE: Interactive clustering and exploration of large and high-dimensional geodata. GeoInformatica, 7(3), 229–253.
Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. Morgan Kaufmann.
Han, J., Kamber, M., & Tung, A. K. H. (2001). Spatial clustering methods in data mining: A survey. In H. J. Miller, & J. Han (Eds.), Geographic data mining and knowledge discovery (pp. 33–50). London: Taylor & Francis.
Han, J., Koperski, K., & Stefanovic, N. (1997). GeoMiner: A system prototype for spatial data mining. ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, pp. 553–556.
Hilbert, D. (1891). Uber die stetige Abbildung einer Linie auf Flachenstuck. Mathematische Annalen, 38, 459–460.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, NJ: Prentice Hall.
Jarvis, R. A., & Patrick, E. A. (1973). Clustering using a similarity measure based on shared near neighbours. IEEE Transactions on Computers, 22(11), 1025–1034.
Keim, D. A., Panse, C., Sips, M., & North, S. C. (2004). Visual data mining in large geospatial point sets. IEEE Computer Graphics and Applications, 24(5), 36–44.
Koperski, K., & Han, J. W. (1995). Discovery of spatial association rules in geographic information databases. Advances in Spatial Databases. Berlin 33, Springer Berlin Heidelberg New York. 951, 47–66.
Koperski, K., Han, J., & Stefanovic, N. (1998). An efficient two-step method for classification of spatial data. 1998 International Symposium on Spatial Data Handling SDH’98, Vancouver, British Columbia, Canada, pp. 45–54.
Lamarque, C. H., & Robert, F. (1996). Image analysis using space-filling curves and 1D wavelet bases. Pattern Recognition, 29(8), 1309–1322.
Lawder, J. K., & King, P. J. H. (2001). Querying multi-dimensional data indexed using the Hilbert space-filling curve. SIGMOD Record, 30(1), 19–24.
Mark, D. M. (1990). Neighbor-based properties of some ordering of two-dimensional space. Geographical Analysis, 22(2), 145–157.
Miller, H. J., & Han, J. (2001). Geographic data mining and knowledge discovery: An overview. In H. J. Miller, & J. Han (Eds.), Geographic data mining and knowledge discovery (pp. 3–32). London: Taylor & Francis.
Mokbel, M. F., & Aref, W. G. (2003). Analysis of multi-dimensional space-filling curves. GeoInformatica, 7(3), 179–209.
Moon, B., Jagadish, H. V., Faloutsos, C., & Saltz, J. H. (2001). Analysis of the clustering properties of the Hilbert space-filling curve. IEEE Transaction on Knowledge and Data Engineering, 13(1), 1–18.
Morton, G. (1966). A computer-oriented geodetic data base and a new technique for file sequencing. IBM Canada: Unpublished report.
Murray, A. T., & Shyy, T. K. (2000). Integrating attribute and space characteristics in choropleth display and spatial data mining. International Journal of Geographical Information Science, 14(7), 649–667.
Ng, R., & Han, J. (1994). Efficient and effective clustering methods for spatial data mining. Proc. 20th international conference on very large databases (pp. 144–155). Santiago, Chile.
Openshaw, S. (1994). Two exploratory space–time-attribute pattern analysers relevant to GIS. In S. Fotheringham (Ed.), Spatial analysis and GIS. Technical issues in geographic information systems (pp. 83–104). Taylor & Francis.
Reinelt, G. (1994). The travelling salesman. Computational solutions for TSP applications. Berlin Heidelberg New York: Springer.
Sammon, J. W. (1969). A non-linear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5), 401–409.
Shekhar, S., Zhang, P., Huang, Y., & Vatsavai, R. (2004). Trend in spatail data mining. In H. Kargupta, A. Joshi, K. Sivakumar, & Y. Yesha (Eds.), Data mining: Next generation challenges and future directions (pp. 357–381). AAAI/MIT Press.
Skubalska-Rafajlowicz, E. (2001). Data compression for pattern recognition based on space-filling curve pseudo-inverse mapping. Nonlinear Analysis—Theory Methods & Applications, 47(1), 315–326.
Steenberghen, T., Dufays, T., Thomas, I., & Flahaut, B. (2004). Intra-urban location and clustering of road accidents using GIS: A Belgian example. International Journal of Geographical Information Science, 18(2), 169–181.
Wang, W., Yang, J., & Muntz, R. (1997). STING : A statistical information grid approach to spatial data mining. 23rd Int. conf on very large data bases (pp. 186–195). Athens, Greece: Morgan Kaufmann.
Wirth, N. (1976). Algorithms + Data structures = Programs. Prentice Hall.
Wong, P. C., Wong, K. K., Foote, H., & Thomas, J. (2003). Global visualization and alignments of whole bacterial genomes. IEEE Transactions on Visualization and Computer Graphics, 9(3), 361–377.
Yamada, I., & Thill, J.-C. (2004). Comparison of planar and network k-functions in traffic accident analysis. Journal of Transport Geography, 12, 149–158.
Young, F. W. (1987). Multidimensional scaling: History, theory, and applications. Lawrence Erlbaum Associates.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guo, D., Gahegan, M. Spatial ordering and encoding for geographic data mining and visualization. J Intell Inf Syst 27, 243–266 (2006). https://doi.org/10.1007/s10844-006-9952-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-006-9952-8