Skip to main content
Log in

A data mining framework for environmental and geo-spatial data analysis

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Mining geo-spatial data is an important task in many application domains, such as environmental science, geographic information science, and social networks. In this paper, we introduce a data mining framework, which includes pre-processing of environmental and geo-spatial data, geo-spatial data mining techniques, and visual analysis of environmental and geo-spatial data. In particular, we propose new density-based clustering algorithms to identify interesting distribution patterns from geo-spatial data, a change pattern discovery technique to detect dynamic change patterns within spatial clusters, and a post-processing technique to extract interesting patterns and useful knowledge from geo-spatial data. Our density-based clustering algorithms are based on the well-established density-based shared nearest neighbor clustering algorithm, which can find clusters of different shape, size, and densities in high-dimensional data. The post-processing analysis technique allows automatic screening of interesting spatial clusters. The change pattern discovery algorithm is able to detect and analyze dynamic patterns of changes within spatial clusters. This paper focuses on developing a framework integrating a sequence of data mining process including clustering algorithm, analysis technique and pattern changing discovery algorithm. In contrast to previous works in this area, our approaches can cluster and analyze dynamically evolved complex objects, i.e., polygons. We evaluate the effectiveness of our techniques through a challenging real case study involving ozone pollution events in the Houston–Galveston–Brazoria area. The experimental results show that our approaches can discover interesting patterns and useful information from geo-spatial air-quality data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Han, J., Kamber, M., Tung, A.: Spatial Clustering Methods in Data Mining: A Survey, Geographic Data Mining and Knowledge Discovery, Research Monographs in GIS. Taylor and Francis, Abingdon (2001)

    Book  Google Scholar 

  2. Chawla, S., Shekhar, S., Wu, W., Ozesmi, U.: Modeling spatial dependencies for mining geospatial data. In: Proceedings of the 2001 SIAM International Conference on Data Mining (2001)

  3. Ertoz, L., Steinback, M., Kumar, V.: Finding clusters of different sizes, shapes, and density in noisy high dimensional data. In: Proceedings of the 3rd SIAM International Conference on Data Mining, San Francisco, CA, USA, May (2003)

  4. Kulldorff, M.: A spatial scan statistic. Commun. Stat. Theory Methods 26, 1481–1496 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  5. Iyengar, S.: On detecting space-time clusters. In: Proceedings of the 10th ACM SIGMOD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August (2004)

  6. Wang, M., Wang, A., Li, A.: Mining spatial–temporal clusters from geodatabases. Lect. Notes Comput. Sci. 4093, 263–270 (2006)

    Article  Google Scholar 

  7. Birant, D., Kut, A.: ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data Knowl. Eng. 60, 208–221 (2007)

    Article  Google Scholar 

  8. Kisilevich, S., Mansmann, F., Rinzivillo, S., Nanni, M.: Spatio-temporal clustering: a survey. In: Data Mining and knowledge Discovery Handlbook, pp. 269–298 (2010)

  9. Gaffney, S., Smyth, P.: Trajectory clustering with mixtures of regression models. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Minin, San Diego, CA, USA, August (1999)

  10. Pelekis, N., Kopanakis, I., Marketos, G., Ntoutsi, I., Andrienko, G., Theodoridis, Y.: Similarity search in trajectory databases. In Proceedings of the 14th International Symposium on Temporal Representation and Reasoning, Alicante, Spain, June (2007)

  11. Nanni, M., Pedreschii, D.: Time-focused clustering of trajectories of moving objects. J. Intell. Inf. Syst. 27, 267–289 (2006)

    Article  Google Scholar 

  12. Rinzivillo, S., Pedreschi, D., Nanni, M., Giannotti, F., Andrienko, N., Andrienko, G.: Visually driven analysis of movement data by progressive clustering. Inf. Vis. 7, 225–239 (2008)

    Article  Google Scholar 

  13. Li, Y., Han, J., Yang, J.: Clustering moving objects. In: Proceedings of the 10th ACM SIGMOD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August (2004)

  14. Joshi, D., Samal, A., Soh, L.: Spatio-temporal polygonal clustering with space and time as first-class citizens. GeoInformatica 17, 387–412 (2013). doi:10.1007/s10707-012-0157-8

    Article  Google Scholar 

  15. Wang, S., Eick, C.: A polygon-based clustering and analysis framework for mining spatial datasets. GeoInformatica 3, 569–594 (2014). doi:10.1007/s10707-013-0190-2

    Article  Google Scholar 

  16. Li, Z., Ding, B., Han, J., Kays, R.: Swarm: Mining relaxed temporal moving object clusters. In: PVLDB, vol. 3, pp. 723–734 (2010)

  17. Benkert, M., Gudmundsson, J., Hubner, F., Wolle, T.: Reporting flock patterns. In: COMGEO (2008)

  18. Gudmundsson, J., van Kreveld M.: Computing longest duration flocks in trajectory data. In: GIS (2006)

  19. Jeung, H., Yiu, M.L., Zhou, X., Jensen C.S., Shen, H.T.: Discovery of convoys in trajectory databases. In: PVLDB (2008)

  20. Chen, L., Ozsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: SIGMOD (2005)

  21. Vlachos, M., Gunopulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: ICDE (2002)

  22. MaIntosh, J., Yuan, M.: A framework to enhance semantic flexibility for analysis of distributed phenomena. Int. J. Geogr. Inf. Sci. 19, 999–1018 (2005)

  23. Rinsurongkawong, V., Chen, C.-S., Eick, C.F., Twa, M.: Analyzing change in spatial data by utilizing polygon models. In: Proceedings of International Conference on Computing for Geospatial Research and Application, Washington DC, USA, June (2010)

  24. Stell, J., Mondo, G.D., Thibaud, R., Claramunt, C.: Spatio-temporal evolution as bigraph dynamics. In: COSIT 2011: Spatial Information Theory, pp. 148–167 (2011)

  25. Texas commission on environmental quality. http://www.tceq.state.tx.us. Accessed May 2011

  26. Chen, C., Rinsurongkawong, V., Eick, C., Twa, M.: Change analysis in spatial data by combining contouring algorithms with supervised density functions. In: Proceedings of the 13th Asia-Pacific Conference on Knowledge Discovery and Data Mining, Bangkok, Thailand, April (2009)

  27. Hangouet, J.: Computing of the hausdorff distance between plane vector polylines. In: Proceedings of the 8th International Symposium on Computer-Assisted Cartography, Charlotte, North Carolina, USA, February (1995)

  28. Buchin, K., Buchin, M., C, W.: Computing the frchet distance between simple polygons in polynomial time. In: Proceedings of the 22nd ACM Symposium on Computational Geometry, Sedona, Arizona, USA, June (2006)

  29. Joshi, D., Samal, A., Soh, L.: A dissimilarity function for clustering geospatial polygons. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS), Seattle, Washington, USA, November (2009)

  30. Wang, S., Chen, C., Rinsurongkawong, V., Akdag, F., Eick, C.: A polygon-based methodology for mining related spatial datasets. In: Proceedings of the 18th ACM SIGSPATIAL Conference on Advances in Geographic Information Systems Workshop on Data Mining for Geoinformatics (DMGI), San Jose, CA, USA, November (2010)

  31. Atallah, M., Ribeiro, C., Lifschitz, S.: Computing some distance functions between polygons. Pattern Recogn. 24(8), 775–781 (1991)

    Article  MATH  Google Scholar 

  32. Lu, R., Turco, R.: Air pollutant transport in a coastal environment. part i: Two-dimensional simulations of sea-breeze and mountain effects. J. Atmos. Sci. 51, 2285–2308 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sujing Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Eick, C.F. A data mining framework for environmental and geo-spatial data analysis. Int J Data Sci Anal 5, 83–98 (2018). https://doi.org/10.1007/s41060-017-0075-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-017-0075-9

Keywords

Navigation