Abstract
Various clustering methods have been applied to climate, ecological, and other environmental datasets, for example to define climate zones, automate land-use classification, and similar tasks. Measuring the “goodness” of such clusters is generally application-dependent and highly subjective, often requiring domain expertise and/or validation with field data (which can be costly or even impossible to acquire). Here we focus on one particular task: the extraction of ocean climate indices from observed climatological data. In this case, it is possible to quantify the relative performance of different methods. Specifically, we propose to extract indices with complex networks constructed from climate data, which have been shown to effectively capture the dynamical behavior of the global climate system, and compare their predictive power to candidate indices obtained using other popular clustering methods. Our results demonstrate that network-based clusters are statistically significantly better predictors of land climate than any other clustering method, which could lead to a deeper understanding of climate processes and complement physics-based climate models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asur, S., Ucar, D., Parthasarathy, S.: An ensemble framework for clustering protein-protein interaction graphs. Bioinformatics 23(13), 29–40 (2007)
Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Mach. Learn. Res. 7, 1–30 (2006)
Donges, J.F., Zou, Y., Marwan, N., Kurths, J.: Complex networks in climate dynamics. Eur. Phs. J. Special Topics 174, 157–179 (2009)
Floyd, R.W.: Algorithm 97: Shortest Path. Comm. ACM 5(6), 345 (1962)
Fovell, R.G., Fovell, M.-Y.C.: Climate Zones of the Conterminous United States Defined Using Cluster Analysis. J. Climate 6(11), 2103–2135 (1993)
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 92, 611–631 (2002)
Glantz, M.H., Katz, R.W., Nicholls, N.: Teleconnections linking worldwide climate anomalies: scientific basis and societal impact. Cambridge University Press, Cambridge (1991)
Guimerá, R., Mossa, S., Turtschi, A., Amaral, L.A.N.: The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proc. Nat. Acad. Sci. USA 102(22), 7794–7799 (2005)
Hall, M.A., Smith, L.A.: Feature Selection for Machine Learning: Comparing a Correlation-based Filter Approach to the Wrapper. In: Int’l Florida AI Research Society Conf., pp. 235–239 (1999)
Han, J., Kamber, M., Tung, A.K.H.: Spatial Clustering in Data Mining: A Survey, pp. 1–29. Taylor and Francis, Abington (2001)
Hargrove, W.W., Hoffman, F.M.: Using Multivariate Clustering to Characterize Ecoregion Borders. Comput. Sci. Eng. 1(4), 18–25 (1999)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-means clustering algorithm. Applied Statistics (28), 100–108 (1979)
Jain, A.K., Murty, N.N., Flynn, P.J.: Data clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)
Kalnay, E., et al.: The NCEP/NCAR 40-Year Reanalysis Project. BAMS 77(3), 437–470 (1996)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Clustering Analysis. Wiley, Chichester (1990)
Loveland, T.R., et al.: Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data. Int. J. Remote Sensing 21(6-7), 1303–1330 (2000)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2001)
Pons, P., Latapy, M.: Computing communities in large networks using random walks. J. Graph Alg. App. 10(2), 191–218 (2006)
Race, C., Steinbach, M., Ganguly, A.R., Semazzi, F., Kumar, V.: A Knowledge Discovery Strategy for Relating Sea Surface Temperatures to Frequencies of Tropical Storms and Generating Predictions of Hurricanes Under 21st-century Global Warming Scenarios. In: NASA Conf. on Intelligent Data Understanding, Mountain View, CA (2010)
Ropelewski, C.F., Jones, P.D.: An Extension of the Tahiti-Darwin Southern Oscillation Index. Mon. Weather Rev. 115, 2161–2165 (1987)
Serrano, A., Boguna, M., Vespignani, A.: Extracting the multiscale backbone of complex weighted networks. PNAS 106(16), 8847–8852 (2009)
Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: ACM SIGKDD Workshop on Text Mining (2000)
Steinbach, M., Tan, P.-N., Kumar, V., Klooster, S., Potter, C.: Discovery of Climate Indices using Clustering. In: ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, pp. 446–455 (2003)
Steinhaeuser, K., Chawla, N.V., Ganguly, A.R.: Complex Networks as a Unified Framework for Descriptive Analysis and Predictive Modeling in Climate. Technical Report TR-2010-07. University of Notre Dame (2010)
Steinhaeuser, K., Chawla, N.V., Ganguly, A.R.: Complex Networks in Climate Science: Progress, Opportunities and Challenges. In: NASA Conf. on Intelligent Data Understanding, Mountain View, CA (2010)
Steinhaeuser, K., Chawla, N.V., Ganguly, A.R.: An Exploration of Climate Data Using Complex Networks. ACM SIGKDD Explorations 12(1), 25–32 (2010)
Tsonis, A.A., Roebber, P.J.: The architecture of the climate network. Physica A 333, 497–504 (2004)
Tsonis, A.A., Swanson, K.L., Roebber, P.J.: What Do Networks Have to Do with Climate? BAMS 87(5), 585–595 (2006)
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1994)
Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)
Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE T. Pattern Anal. 15(11), 1101–1113 (1993)
Yamasaki, K., Gozolchiani, A., Havlin, S.: Climate Networks around the Globe are Significantly Affected by El Niño. Phys. Rev. Lett. 100(22), 157–179 (2008)
http://www.cdc.noaa.gov/data/gridded/data.ncep.reanalysis.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Steinhaeuser, K., Chawla, N.V., Ganguly, A.R. (2011). Comparing Predictive Power in Climate Data: Clustering Matters. In: Pfoser, D., et al. Advances in Spatial and Temporal Databases. SSTD 2011. Lecture Notes in Computer Science, vol 6849. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22922-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-22922-0_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22921-3
Online ISBN: 978-3-642-22922-0
eBook Packages: Computer ScienceComputer Science (R0)