Abstract
Mining geo-spatial data is an important task in many application domains, such as environmental science, geographic information science, and social networks. In this paper, we introduce a data mining framework, which includes pre-processing of environmental and geo-spatial data, geo-spatial data mining techniques, and visual analysis of environmental and geo-spatial data. In particular, we propose new density-based clustering algorithms to identify interesting distribution patterns from geo-spatial data, a change pattern discovery technique to detect dynamic change patterns within spatial clusters, and a post-processing technique to extract interesting patterns and useful knowledge from geo-spatial data. Our density-based clustering algorithms are based on the well-established density-based shared nearest neighbor clustering algorithm, which can find clusters of different shape, size, and densities in high-dimensional data. The post-processing analysis technique allows automatic screening of interesting spatial clusters. The change pattern discovery algorithm is able to detect and analyze dynamic patterns of changes within spatial clusters. This paper focuses on developing a framework integrating a sequence of data mining process including clustering algorithm, analysis technique and pattern changing discovery algorithm. In contrast to previous works in this area, our approaches can cluster and analyze dynamically evolved complex objects, i.e., polygons. We evaluate the effectiveness of our techniques through a challenging real case study involving ozone pollution events in the Houston–Galveston–Brazoria area. The experimental results show that our approaches can discover interesting patterns and useful information from geo-spatial air-quality data.
Similar content being viewed by others
References
Han, J., Kamber, M., Tung, A.: Spatial Clustering Methods in Data Mining: A Survey, Geographic Data Mining and Knowledge Discovery, Research Monographs in GIS. Taylor and Francis, Abingdon (2001)
Chawla, S., Shekhar, S., Wu, W., Ozesmi, U.: Modeling spatial dependencies for mining geospatial data. In: Proceedings of the 2001 SIAM International Conference on Data Mining (2001)
Ertoz, L., Steinback, M., Kumar, V.: Finding clusters of different sizes, shapes, and density in noisy high dimensional data. In: Proceedings of the 3rd SIAM International Conference on Data Mining, San Francisco, CA, USA, May (2003)
Kulldorff, M.: A spatial scan statistic. Commun. Stat. Theory Methods 26, 1481–1496 (1997)
Iyengar, S.: On detecting space-time clusters. In: Proceedings of the 10th ACM SIGMOD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August (2004)
Wang, M., Wang, A., Li, A.: Mining spatial–temporal clusters from geodatabases. Lect. Notes Comput. Sci. 4093, 263–270 (2006)
Birant, D., Kut, A.: ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data Knowl. Eng. 60, 208–221 (2007)
Kisilevich, S., Mansmann, F., Rinzivillo, S., Nanni, M.: Spatio-temporal clustering: a survey. In: Data Mining and knowledge Discovery Handlbook, pp. 269–298 (2010)
Gaffney, S., Smyth, P.: Trajectory clustering with mixtures of regression models. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Minin, San Diego, CA, USA, August (1999)
Pelekis, N., Kopanakis, I., Marketos, G., Ntoutsi, I., Andrienko, G., Theodoridis, Y.: Similarity search in trajectory databases. In Proceedings of the 14th International Symposium on Temporal Representation and Reasoning, Alicante, Spain, June (2007)
Nanni, M., Pedreschii, D.: Time-focused clustering of trajectories of moving objects. J. Intell. Inf. Syst. 27, 267–289 (2006)
Rinzivillo, S., Pedreschi, D., Nanni, M., Giannotti, F., Andrienko, N., Andrienko, G.: Visually driven analysis of movement data by progressive clustering. Inf. Vis. 7, 225–239 (2008)
Li, Y., Han, J., Yang, J.: Clustering moving objects. In: Proceedings of the 10th ACM SIGMOD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August (2004)
Joshi, D., Samal, A., Soh, L.: Spatio-temporal polygonal clustering with space and time as first-class citizens. GeoInformatica 17, 387–412 (2013). doi:10.1007/s10707-012-0157-8
Wang, S., Eick, C.: A polygon-based clustering and analysis framework for mining spatial datasets. GeoInformatica 3, 569–594 (2014). doi:10.1007/s10707-013-0190-2
Li, Z., Ding, B., Han, J., Kays, R.: Swarm: Mining relaxed temporal moving object clusters. In: PVLDB, vol. 3, pp. 723–734 (2010)
Benkert, M., Gudmundsson, J., Hubner, F., Wolle, T.: Reporting flock patterns. In: COMGEO (2008)
Gudmundsson, J., van Kreveld M.: Computing longest duration flocks in trajectory data. In: GIS (2006)
Jeung, H., Yiu, M.L., Zhou, X., Jensen C.S., Shen, H.T.: Discovery of convoys in trajectory databases. In: PVLDB (2008)
Chen, L., Ozsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: SIGMOD (2005)
Vlachos, M., Gunopulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: ICDE (2002)
MaIntosh, J., Yuan, M.: A framework to enhance semantic flexibility for analysis of distributed phenomena. Int. J. Geogr. Inf. Sci. 19, 999–1018 (2005)
Rinsurongkawong, V., Chen, C.-S., Eick, C.F., Twa, M.: Analyzing change in spatial data by utilizing polygon models. In: Proceedings of International Conference on Computing for Geospatial Research and Application, Washington DC, USA, June (2010)
Stell, J., Mondo, G.D., Thibaud, R., Claramunt, C.: Spatio-temporal evolution as bigraph dynamics. In: COSIT 2011: Spatial Information Theory, pp. 148–167 (2011)
Texas commission on environmental quality. http://www.tceq.state.tx.us. Accessed May 2011
Chen, C., Rinsurongkawong, V., Eick, C., Twa, M.: Change analysis in spatial data by combining contouring algorithms with supervised density functions. In: Proceedings of the 13th Asia-Pacific Conference on Knowledge Discovery and Data Mining, Bangkok, Thailand, April (2009)
Hangouet, J.: Computing of the hausdorff distance between plane vector polylines. In: Proceedings of the 8th International Symposium on Computer-Assisted Cartography, Charlotte, North Carolina, USA, February (1995)
Buchin, K., Buchin, M., C, W.: Computing the frchet distance between simple polygons in polynomial time. In: Proceedings of the 22nd ACM Symposium on Computational Geometry, Sedona, Arizona, USA, June (2006)
Joshi, D., Samal, A., Soh, L.: A dissimilarity function for clustering geospatial polygons. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS), Seattle, Washington, USA, November (2009)
Wang, S., Chen, C., Rinsurongkawong, V., Akdag, F., Eick, C.: A polygon-based methodology for mining related spatial datasets. In: Proceedings of the 18th ACM SIGSPATIAL Conference on Advances in Geographic Information Systems Workshop on Data Mining for Geoinformatics (DMGI), San Jose, CA, USA, November (2010)
Atallah, M., Ribeiro, C., Lifschitz, S.: Computing some distance functions between polygons. Pattern Recogn. 24(8), 775–781 (1991)
Lu, R., Turco, R.: Air pollutant transport in a coastal environment. part i: Two-dimensional simulations of sea-breeze and mountain effects. J. Atmos. Sci. 51, 2285–2308 (1994)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Wang, S., Eick, C.F. A data mining framework for environmental and geo-spatial data analysis. Int J Data Sci Anal 5, 83–98 (2018). https://doi.org/10.1007/s41060-017-0075-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-017-0075-9