Spatial ordering and encoding for geographic data mining and visualization

Guo, Diansheng; Gahegan, Mark

doi:10.1007/s10844-006-9952-8

Spatial ordering and encoding for geographic data mining and visualization

Published: 21 November 2006

Volume 27, pages 243–266, (2006)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Diansheng Guo¹ &
Mark Gahegan²

416 Accesses
24 Citations
3 Altmetric
Explore all metrics

Abstract

Geographic information (e.g., locations, networks, and nearest neighbors) are unique and different from other aspatial attributes (e.g., population, sales, or income). It is a challenging problem in spatial data mining and visualization to take into account both the geographic information and multiple aspatial variables in the detection of patterns. To tackle this problem, we present and evaluate a variety of spatial ordering methods that can transform spatial relations into a one-dimensional ordering and encoding which preserves spatial locality as much possible. The ordering can then be used to spatially sort temporal or multivariate data series and thus help reveal patterns across different spaces. The encoding, as a materialization of spatial clusters and neighboring relations, is also amenable for processing together with aspatial variables by any existing (non-spatial) data mining methods. We design a set of measures to evaluate nine different ordering/encoding methods, including two space-filling curves, six hierarchical clustering based methods, and a one-dimensional Sammon mapping (a multidimensional scaling approach). Evaluation results with various data distributions show that the optimal ordering/encoding with the complete-linkage clustering consistently gives the best overall performance, surpassing well-known space-filling curves in preserving spatial locality. Moreover, clustering-based methods can encode not only simple geographic locations, e.g., x and y coordinates, but also a wide range of other spatial relations, e.g., network distances or arbitrarily weighted graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Andrienko, G., & Andrienko, N. (1999). Interactive maps for visual data exploration. International Journal of Geographical Information System, 13(4), 355–374.
Article Google Scholar
Andrienko, N., Andrienko, G., & Gatalsky, P. (2003). Exploratory spatio-temporal visualization: An analytical review. Journal of Visual Languages & Computing, 14(6), 503–541.
Article Google Scholar
Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM SIGMOD international conference on management of data (pp. 49–60). Philadelphia, PA, USA: ACM.
Google Scholar
Baase, S., & Gelder, A. V. (2000). Computer algorithms. Addison-Wesley.
Bar-Joseph, Z., Demaine, E. D., Gifford, D. K., Hamel, A. M., Jaakkola, T. S., & Srebro, N. (2003). K-ary clustering with optimal leaf ordering for gene expression data. Bioinformatics, 19(9), 1070–1078.
Article Google Scholar
Bar-Joseph, Z., Gifford, D. K., & Jaakkola, T. S. (2001). Fast optimal leaf ordering for hierarchical clustering. Bioinformatics, 17(Suppl. 1), S22–S29.
Google Scholar
Breinholt, G., & Schierz, C. (1998). Algorithm 781: Generating Hilbert’s space-filling curve by recursion. ACM Transactions on Mathematical Software, 24(2), 184–189.
Article MATH MathSciNet Google Scholar
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification. New York: Wiley.
MATH Google Scholar
Ertoz, L., Steinbach, M., & Kumar, V. (2003). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. The third SIAM International Conference on Data Mining (SDM ’03). San Francisco, California, USA.
Ester, M., Kriegel, H. P., & Sander, J. (1997). Spatial data mining: A database approach. Advances in spatial databases. Berlin 33, Springer Berlin Heidelberg New York. 1262, 47–66.
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. The second international conference on knowledge discovery and data mining (pp. 226–231). Portland, Oregon, USA: AAAI.
Google Scholar
Fredrikson, A., North, C., Plaisant, C., & Shneiderman, B. (1999). Temporal, geographical and categorical aggregations viewed through coordinated displays: A case study with highway incident data. Workshop on New Paradigms in Information Visualization and Manipulation (in conjunction with ACM CIKM’99), Kansas City, Missouri, November 6, ACM New York, pp. 26–34.
Friendly, M., & Kwan, E. (2003). Effect ordering for data displays. Computational Statistics & Data Analysis, 43(4), 509–539.
Article MathSciNet MATH Google Scholar
Gahegan, M. (2000). The case for inductive and visual techniques in the analysis of spatial data. Journal of Geographical Systems, 2(1), 77–83.
Article Google Scholar
Goodchild, M. F., & Grandfield, A. W. (1983). Optimizing raster storage: An examination of four alternatives. Proceedings, Auto-Carto, 6, 400–407.
Google Scholar
Gordon, A. D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society. Series A (General), 150(2), 119–137.
Article MATH MathSciNet Google Scholar
Gordon, A. D. (1996). Hierarchical classification. In P. Arabie, L. J. Hubert, & G. D. Soete (Eds.), Clustering and classification (pp. 65–122). River Edge, New Jersey, USA: World Scientific.
Google Scholar
Gotsman, C., & Lindenbaum, M. (1996). On the metric properties of discrete space-filling curves. IEEE Transactions on Image Processing, 5(5), 794–797.
Article Google Scholar
Guo, D., Gahegan, M., MacEachren, A. M., & Zhou, B. (2005). Multivariate analysis and geovisualization with an integrated geographic knowledge discovery approach. Cartography and Geographic Information Science, 32(2), 113–132.
Article Google Scholar
Guo, D., Peuquet, D., & Gahegan, M. (2003). ICEAGE: Interactive clustering and exploration of large and high-dimensional geodata. GeoInformatica, 7(3), 229–253.
Article MATH Google Scholar
Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. Morgan Kaufmann.
Han, J., Kamber, M., & Tung, A. K. H. (2001). Spatial clustering methods in data mining: A survey. In H. J. Miller, & J. Han (Eds.), Geographic data mining and knowledge discovery (pp. 33–50). London: Taylor & Francis.
Google Scholar
Han, J., Koperski, K., & Stefanovic, N. (1997). GeoMiner: A system prototype for spatial data mining. ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, pp. 553–556.
Hilbert, D. (1891). Uber die stetige Abbildung einer Linie auf Flachenstuck. Mathematische Annalen, 38, 459–460.
Article MATH MathSciNet Google Scholar
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, NJ: Prentice Hall.
MATH Google Scholar
Jarvis, R. A., & Patrick, E. A. (1973). Clustering using a similarity measure based on shared near neighbours. IEEE Transactions on Computers, 22(11), 1025–1034.
Google Scholar
Keim, D. A., Panse, C., Sips, M., & North, S. C. (2004). Visual data mining in large geospatial point sets. IEEE Computer Graphics and Applications, 24(5), 36–44.
Article Google Scholar
Koperski, K., & Han, J. W. (1995). Discovery of spatial association rules in geographic information databases. Advances in Spatial Databases. Berlin 33, Springer Berlin Heidelberg New York. 951, 47–66.
Koperski, K., Han, J., & Stefanovic, N. (1998). An efficient two-step method for classification of spatial data. 1998 International Symposium on Spatial Data Handling SDH’98, Vancouver, British Columbia, Canada, pp. 45–54.
Lamarque, C. H., & Robert, F. (1996). Image analysis using space-filling curves and 1D wavelet bases. Pattern Recognition, 29(8), 1309–1322.
Article Google Scholar
Lawder, J. K., & King, P. J. H. (2001). Querying multi-dimensional data indexed using the Hilbert space-filling curve. SIGMOD Record, 30(1), 19–24.
Article Google Scholar
Mark, D. M. (1990). Neighbor-based properties of some ordering of two-dimensional space. Geographical Analysis, 22(2), 145–157.
Article Google Scholar
Miller, H. J., & Han, J. (2001). Geographic data mining and knowledge discovery: An overview. In H. J. Miller, & J. Han (Eds.), Geographic data mining and knowledge discovery (pp. 3–32). London: Taylor & Francis.
Google Scholar
Mokbel, M. F., & Aref, W. G. (2003). Analysis of multi-dimensional space-filling curves. GeoInformatica, 7(3), 179–209.
Article Google Scholar
Moon, B., Jagadish, H. V., Faloutsos, C., & Saltz, J. H. (2001). Analysis of the clustering properties of the Hilbert space-filling curve. IEEE Transaction on Knowledge and Data Engineering, 13(1), 1–18.
MATH Google Scholar
Morton, G. (1966). A computer-oriented geodetic data base and a new technique for file sequencing. IBM Canada: Unpublished report.
Murray, A. T., & Shyy, T. K. (2000). Integrating attribute and space characteristics in choropleth display and spatial data mining. International Journal of Geographical Information Science, 14(7), 649–667.
Article Google Scholar
Ng, R., & Han, J. (1994). Efficient and effective clustering methods for spatial data mining. Proc. 20th international conference on very large databases (pp. 144–155). Santiago, Chile.
Openshaw, S. (1994). Two exploratory space–time-attribute pattern analysers relevant to GIS. In S. Fotheringham (Ed.), Spatial analysis and GIS. Technical issues in geographic information systems (pp. 83–104). Taylor & Francis.
Reinelt, G. (1994). The travelling salesman. Computational solutions for TSP applications. Berlin Heidelberg New York: Springer.
Google Scholar
Sammon, J. W. (1969). A non-linear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5), 401–409.
Google Scholar
Shekhar, S., Zhang, P., Huang, Y., & Vatsavai, R. (2004). Trend in spatail data mining. In H. Kargupta, A. Joshi, K. Sivakumar, & Y. Yesha (Eds.), Data mining: Next generation challenges and future directions (pp. 357–381). AAAI/MIT Press.
Skubalska-Rafajlowicz, E. (2001). Data compression for pattern recognition based on space-filling curve pseudo-inverse mapping. Nonlinear Analysis—Theory Methods & Applications, 47(1), 315–326.
Article MATH MathSciNet Google Scholar
Steenberghen, T., Dufays, T., Thomas, I., & Flahaut, B. (2004). Intra-urban location and clustering of road accidents using GIS: A Belgian example. International Journal of Geographical Information Science, 18(2), 169–181.
Article Google Scholar
Wang, W., Yang, J., & Muntz, R. (1997). STING : A statistical information grid approach to spatial data mining. 23rd Int. conf on very large data bases (pp. 186–195). Athens, Greece: Morgan Kaufmann.
Google Scholar
Wirth, N. (1976). Algorithms + Data structures = Programs. Prentice Hall.
Wong, P. C., Wong, K. K., Foote, H., & Thomas, J. (2003). Global visualization and alignments of whole bacterial genomes. IEEE Transactions on Visualization and Computer Graphics, 9(3), 361–377.
Article Google Scholar
Yamada, I., & Thill, J.-C. (2004). Comparison of planar and network k-functions in traffic accident analysis. Journal of Transport Geography, 12, 149–158.
Article Google Scholar
Young, F. W. (1987). Multidimensional scaling: History, theory, and applications. Lawrence Erlbaum Associates.

Download references

Author information

Authors and Affiliations

Department of Geography, University of South Carolina, 709 Bull Street, Columbia, SC, 29208, USA
Diansheng Guo
GeoVISTA Center, Department of Geography, Pennsylvania State University, 302 Walker, University Park, PA, 16802, USA
Mark Gahegan

Authors

Diansheng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Mark Gahegan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diansheng Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, D., Gahegan, M. Spatial ordering and encoding for geographic data mining and visualization. J Intell Inf Syst 27, 243–266 (2006). https://doi.org/10.1007/s10844-006-9952-8

Download citation

Published: 21 November 2006
Issue Date: November 2006
DOI: https://doi.org/10.1007/s10844-006-9952-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial ordering and encoding for geographic data mining and visualization

Abstract

Access this article

Similar content being viewed by others

Data clustering: application and trends

Clustering graph data: the roadmap to spectral techniques

Spatial Data Management, Analysis, and Modeling in GIS: Principles and Applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spatial ordering and encoding for geographic data mining and visualization

Abstract

Access this article

Similar content being viewed by others

Data clustering: application and trends

Clustering graph data: the roadmap to spectral techniques

Spatial Data Management, Analysis, and Modeling in GIS: Principles and Applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation