Abstract
Spatial autocorrelation is the correlation among data values, strictly due to the relative location proximity of the objects that the data refer to. This statistical property clearly indicates a violation of the assumption of observation independence - a pre-condition assumed by most of the data mining and statistical models. Inappropriate treatment of data with spatial dependencies could obfuscate important insights when spatial autocorrelation is ignored. In this paper, we propose a data mining method that explicitly considers autocorrelation when building the models. The method is based on the concept of predictive clustering trees (PCTs). The proposed approach combines the possibility of capturing both global and local effects and dealing with positive spatial autocorrelation. The discovered models adapt to local properties of the data, providing at the same time spatially smoothed predictions. Results show the effectiveness of the proposed solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bel, D., Allard, L., Laurent, J., Cheddadi, R., Bar-Hen, A.: Cart algorithm for spatial data: application to environmental and ecological data. Computational Statistics and Data Analysis 53, 3082–3093 (2009)
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proc. 15th Intl. Conf. on Machine Learning, pp. 55–63 (1998)
Breiman, L., Friedman, J., Olshen, R., Stone, J.: Classification and Regression trees. Wadsworth & Brooks, Belmont (1984)
Brent, R.: Algorithms for Minimization without Derivatives. Prentice-Hall, Englewood Cliffs (1973)
Ceci, M., Appice, A.: Spatial associative classification: propositional vs structural approach. Journal of Intelligent Information Systems 27(3), 191–213 (2006)
Demšar, D., Debeljak, M., Lavigne, C., Džeroski, S.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: Abstracts of the 90th ESA Annual Meeting, p. 152. The Ecological Society of America (2005)
Džeroski, S., Gjorgjioski, V., Slavkov, I., Struyf, J.: Analysis of time series data with predictive clustering trees. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 63–80. Springer, Heidelberg (2007)
Ester, M., Kriegel, H., Sander, J.: Spatial data mining: A database approach. In: Proc. 5th Intl. Symp. on Spatial Databases, pp. 47–66 (1997)
Fotheringham, A.S., Brunsdon, C., Charlton, M.: Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Wiley, Chichester (2002)
Gora, G., Wojna, A.: RIONA: A classifier combining rule induction and k-NN method with automated selection of optimal neighbourhood. In: Proc. 13th European Conf. on Machine Learning, pp. 111–123 (2002)
Huang, Y., Shekhar, S., Xiong, H.: Discovering colocation patterns from spatial data sets: A general approach. IEEE Trans. Knowl. Data Eng. 16(12), 1472–1485 (2004)
Jensen, D., Neville, J.: Linkage and autocorrelation cause feature selection bias in relational learning. In: Proc. 9th Intl. Conf. on Machine Learning, pp. 259–266 (2002)
Kühn, I.: Incorporating spatial autocorrelation invert observed patterns. Diversity and Distributions 13(1), 66–69 (2007)
Legendre, P.: Spatial autocorrelation: Trouble or new paradigm? Ecology 74(6), 1659–1673 (1993)
LeSage, J.H., Pace, K.: Spatial dependence in data mining. In: Data Mining for Scientific and Engineering Applications, pp. 439–460. Kluwer Academic, Dordrecht (2001)
Li, X., Claramunt, C.: A spatial entropy-based decision tree for classification of geographical information. Transactions in GIS 10, 451–467 (2006)
Malerba, D., Appice, A., Varlaro, A., Lanza, A.: Spatial clustering of structured objects. In: Kramer, S., Pfahringer, B. (eds.) ILP 2005. LNCS (LNAI), vol. 3625, pp. 227–245. Springer, Heidelberg (2005)
Malerba, D., Ceci, M., Appice, A.: Mining model trees from spatial data. In: Proc. 9th European Conf. on Principles of Knowledge Discovery and Databases, pp. 169–180 (2005)
Mehta, M., Agrawal, R., Rissanen, J.: Sliq: A fast scalable classifier for data mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996)
Michalski, R.S., Stepp, R.E.: Machine Learning: An Artificial Intelligence Approach. In: Learning From Observation: Conceptual Clustering, pp. 331–363 (2003)
Pace, P., Barry, R.: Quick computation of regression with a spatially autoregressive dependent variable. Geographical Analysis 29(3), 232–247 (1997)
Robinson, W.S.: Ecological correlations and the behavior of individuals. American Sociological Review 15, 351–357 (1950)
Scrucca, L.: Clustering multivariate spatial data based on local measures of spatial autocorrelation. Università di Puglia 20/2005 (2005)
Tobler, W.: A computer movie simulating urban growth in the Detroit region. Economic Geography 46(2), 234–240 (1970)
Zhang, P., Huang, Y., Shekhar, S., Kumar, V.: Exploiting spatial autocorrelation to efficiently process correlation-based similarity queries. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750, pp. 449–468. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stojanova, D., Ceci, M., Appice, A., Malerba, D., Džeroski, S. (2011). Global and Local Spatial Autocorrelation in Predictive Clustering Trees. In: Elomaa, T., Hollmén, J., Mannila, H. (eds) Discovery Science. DS 2011. Lecture Notes in Computer Science(), vol 6926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24477-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-24477-3_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24476-6
Online ISBN: 978-3-642-24477-3
eBook Packages: Computer ScienceComputer Science (R0)