Skip to main content
Log in

Spatial ordering and encoding for geographic data mining and visualization

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Geographic information (e.g., locations, networks, and nearest neighbors) are unique and different from other aspatial attributes (e.g., population, sales, or income). It is a challenging problem in spatial data mining and visualization to take into account both the geographic information and multiple aspatial variables in the detection of patterns. To tackle this problem, we present and evaluate a variety of spatial ordering methods that can transform spatial relations into a one-dimensional ordering and encoding which preserves spatial locality as much possible. The ordering can then be used to spatially sort temporal or multivariate data series and thus help reveal patterns across different spaces. The encoding, as a materialization of spatial clusters and neighboring relations, is also amenable for processing together with aspatial variables by any existing (non-spatial) data mining methods. We design a set of measures to evaluate nine different ordering/encoding methods, including two space-filling curves, six hierarchical clustering based methods, and a one-dimensional Sammon mapping (a multidimensional scaling approach). Evaluation results with various data distributions show that the optimal ordering/encoding with the complete-linkage clustering consistently gives the best overall performance, surpassing well-known space-filling curves in preserving spatial locality. Moreover, clustering-based methods can encode not only simple geographic locations, e.g., x and y coordinates, but also a wide range of other spatial relations, e.g., network distances or arbitrarily weighted graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andrienko, G., & Andrienko, N. (1999). Interactive maps for visual data exploration. International Journal of Geographical Information System, 13(4), 355–374.

    Article  Google Scholar 

  • Andrienko, N., Andrienko, G., & Gatalsky, P. (2003). Exploratory spatio-temporal visualization: An analytical review. Journal of Visual Languages & Computing, 14(6), 503–541.

    Article  Google Scholar 

  • Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM SIGMOD international conference on management of data (pp. 49–60). Philadelphia, PA, USA: ACM.

    Google Scholar 

  • Baase, S., & Gelder, A. V. (2000). Computer algorithms. Addison-Wesley.

  • Bar-Joseph, Z., Demaine, E. D., Gifford, D. K., Hamel, A. M., Jaakkola, T. S., & Srebro, N. (2003). K-ary clustering with optimal leaf ordering for gene expression data. Bioinformatics, 19(9), 1070–1078.

    Article  Google Scholar 

  • Bar-Joseph, Z., Gifford, D. K., & Jaakkola, T. S. (2001). Fast optimal leaf ordering for hierarchical clustering. Bioinformatics, 17(Suppl. 1), S22–S29.

    Google Scholar 

  • Breinholt, G., & Schierz, C. (1998). Algorithm 781: Generating Hilbert’s space-filling curve by recursion. ACM Transactions on Mathematical Software, 24(2), 184–189.

    Article  MATH  MathSciNet  Google Scholar 

  • Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification. New York: Wiley.

    MATH  Google Scholar 

  • Ertoz, L., Steinbach, M., & Kumar, V. (2003). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. The third SIAM International Conference on Data Mining (SDM ’03). San Francisco, California, USA.

  • Ester, M., Kriegel, H. P., & Sander, J. (1997). Spatial data mining: A database approach. Advances in spatial databases. Berlin 33, Springer Berlin Heidelberg New York. 1262, 47–66.

  • Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. The second international conference on knowledge discovery and data mining (pp. 226–231). Portland, Oregon, USA: AAAI.

    Google Scholar 

  • Fredrikson, A., North, C., Plaisant, C., & Shneiderman, B. (1999). Temporal, geographical and categorical aggregations viewed through coordinated displays: A case study with highway incident data. Workshop on New Paradigms in Information Visualization and Manipulation (in conjunction with ACM CIKM’99), Kansas City, Missouri, November 6, ACM New York, pp. 26–34.

  • Friendly, M., & Kwan, E. (2003). Effect ordering for data displays. Computational Statistics & Data Analysis, 43(4), 509–539.

    Article  MathSciNet  MATH  Google Scholar 

  • Gahegan, M. (2000). The case for inductive and visual techniques in the analysis of spatial data. Journal of Geographical Systems, 2(1), 77–83.

    Article  Google Scholar 

  • Goodchild, M. F., & Grandfield, A. W. (1983). Optimizing raster storage: An examination of four alternatives. Proceedings, Auto-Carto, 6, 400–407.

    Google Scholar 

  • Gordon, A. D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society. Series A (General), 150(2), 119–137.

    Article  MATH  MathSciNet  Google Scholar 

  • Gordon, A. D. (1996). Hierarchical classification. In P. Arabie, L. J. Hubert, & G. D. Soete (Eds.), Clustering and classification (pp. 65–122). River Edge, New Jersey, USA: World Scientific.

    Google Scholar 

  • Gotsman, C., & Lindenbaum, M. (1996). On the metric properties of discrete space-filling curves. IEEE Transactions on Image Processing, 5(5), 794–797.

    Article  Google Scholar 

  • Guo, D., Gahegan, M., MacEachren, A. M., & Zhou, B. (2005). Multivariate analysis and geovisualization with an integrated geographic knowledge discovery approach. Cartography and Geographic Information Science, 32(2), 113–132.

    Article  Google Scholar 

  • Guo, D., Peuquet, D., & Gahegan, M. (2003). ICEAGE: Interactive clustering and exploration of large and high-dimensional geodata. GeoInformatica, 7(3), 229–253.

    Article  MATH  Google Scholar 

  • Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. Morgan Kaufmann.

  • Han, J., Kamber, M., & Tung, A. K. H. (2001). Spatial clustering methods in data mining: A survey. In H. J. Miller, & J. Han (Eds.), Geographic data mining and knowledge discovery (pp. 33–50). London: Taylor & Francis.

    Google Scholar 

  • Han, J., Koperski, K., & Stefanovic, N. (1997). GeoMiner: A system prototype for spatial data mining. ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, pp. 553–556.

  • Hilbert, D. (1891). Uber die stetige Abbildung einer Linie auf Flachenstuck. Mathematische Annalen, 38, 459–460.

    Article  MATH  MathSciNet  Google Scholar 

  • Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, NJ: Prentice Hall.

    MATH  Google Scholar 

  • Jarvis, R. A., & Patrick, E. A. (1973). Clustering using a similarity measure based on shared near neighbours. IEEE Transactions on Computers, 22(11), 1025–1034.

    Google Scholar 

  • Keim, D. A., Panse, C., Sips, M., & North, S. C. (2004). Visual data mining in large geospatial point sets. IEEE Computer Graphics and Applications, 24(5), 36–44.

    Article  Google Scholar 

  • Koperski, K., & Han, J. W. (1995). Discovery of spatial association rules in geographic information databases. Advances in Spatial Databases. Berlin 33, Springer Berlin Heidelberg New York. 951, 47–66.

  • Koperski, K., Han, J., & Stefanovic, N. (1998). An efficient two-step method for classification of spatial data. 1998 International Symposium on Spatial Data Handling SDH’98, Vancouver, British Columbia, Canada, pp. 45–54.

  • Lamarque, C. H., & Robert, F. (1996). Image analysis using space-filling curves and 1D wavelet bases. Pattern Recognition, 29(8), 1309–1322.

    Article  Google Scholar 

  • Lawder, J. K., & King, P. J. H. (2001). Querying multi-dimensional data indexed using the Hilbert space-filling curve. SIGMOD Record, 30(1), 19–24.

    Article  Google Scholar 

  • Mark, D. M. (1990). Neighbor-based properties of some ordering of two-dimensional space. Geographical Analysis, 22(2), 145–157.

    Article  Google Scholar 

  • Miller, H. J., & Han, J. (2001). Geographic data mining and knowledge discovery: An overview. In H. J. Miller, & J. Han (Eds.), Geographic data mining and knowledge discovery (pp. 3–32). London: Taylor & Francis.

    Google Scholar 

  • Mokbel, M. F., & Aref, W. G. (2003). Analysis of multi-dimensional space-filling curves. GeoInformatica, 7(3), 179–209.

    Article  Google Scholar 

  • Moon, B., Jagadish, H. V., Faloutsos, C., & Saltz, J. H. (2001). Analysis of the clustering properties of the Hilbert space-filling curve. IEEE Transaction on Knowledge and Data Engineering, 13(1), 1–18.

    MATH  Google Scholar 

  • Morton, G. (1966). A computer-oriented geodetic data base and a new technique for file sequencing. IBM Canada: Unpublished report.

  • Murray, A. T., & Shyy, T. K. (2000). Integrating attribute and space characteristics in choropleth display and spatial data mining. International Journal of Geographical Information Science, 14(7), 649–667.

    Article  Google Scholar 

  • Ng, R., & Han, J. (1994). Efficient and effective clustering methods for spatial data mining. Proc. 20th international conference on very large databases (pp. 144–155). Santiago, Chile.

  • Openshaw, S. (1994). Two exploratory space–time-attribute pattern analysers relevant to GIS. In S. Fotheringham (Ed.), Spatial analysis and GIS. Technical issues in geographic information systems (pp. 83–104). Taylor & Francis.

  • Reinelt, G. (1994). The travelling salesman. Computational solutions for TSP applications. Berlin Heidelberg New York: Springer.

    Google Scholar 

  • Sammon, J. W. (1969). A non-linear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5), 401–409.

    Google Scholar 

  • Shekhar, S., Zhang, P., Huang, Y., & Vatsavai, R. (2004). Trend in spatail data mining. In H. Kargupta, A. Joshi, K. Sivakumar, & Y. Yesha (Eds.), Data mining: Next generation challenges and future directions (pp. 357–381). AAAI/MIT Press.

  • Skubalska-Rafajlowicz, E. (2001). Data compression for pattern recognition based on space-filling curve pseudo-inverse mapping. Nonlinear Analysis—Theory Methods & Applications, 47(1), 315–326.

    Article  MATH  MathSciNet  Google Scholar 

  • Steenberghen, T., Dufays, T., Thomas, I., & Flahaut, B. (2004). Intra-urban location and clustering of road accidents using GIS: A Belgian example. International Journal of Geographical Information Science, 18(2), 169–181.

    Article  Google Scholar 

  • Wang, W., Yang, J., & Muntz, R. (1997). STING : A statistical information grid approach to spatial data mining. 23rd Int. conf on very large data bases (pp. 186–195). Athens, Greece: Morgan Kaufmann.

    Google Scholar 

  • Wirth, N. (1976). Algorithms + Data structures = Programs. Prentice Hall.

  • Wong, P. C., Wong, K. K., Foote, H., & Thomas, J. (2003). Global visualization and alignments of whole bacterial genomes. IEEE Transactions on Visualization and Computer Graphics, 9(3), 361–377.

    Article  Google Scholar 

  • Yamada, I., & Thill, J.-C. (2004). Comparison of planar and network k-functions in traffic accident analysis. Journal of Transport Geography, 12, 149–158.

    Article  Google Scholar 

  • Young, F. W. (1987). Multidimensional scaling: History, theory, and applications. Lawrence Erlbaum Associates.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diansheng Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, D., Gahegan, M. Spatial ordering and encoding for geographic data mining and visualization. J Intell Inf Syst 27, 243–266 (2006). https://doi.org/10.1007/s10844-006-9952-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-006-9952-8

Keywords

Navigation