Skip to main content

Comparing Predictive Power in Climate Data: Clustering Matters

  • Conference paper
Book cover Advances in Spatial and Temporal Databases (SSTD 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6849))

Included in the following conference series:

Abstract

Various clustering methods have been applied to climate, ecological, and other environmental datasets, for example to define climate zones, automate land-use classification, and similar tasks. Measuring the “goodness” of such clusters is generally application-dependent and highly subjective, often requiring domain expertise and/or validation with field data (which can be costly or even impossible to acquire). Here we focus on one particular task: the extraction of ocean climate indices from observed climatological data. In this case, it is possible to quantify the relative performance of different methods. Specifically, we propose to extract indices with complex networks constructed from climate data, which have been shown to effectively capture the dynamical behavior of the global climate system, and compare their predictive power to candidate indices obtained using other popular clustering methods. Our results demonstrate that network-based clusters are statistically significantly better predictors of land climate than any other clustering method, which could lead to a deeper understanding of climate processes and complement physics-based climate models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asur, S., Ucar, D., Parthasarathy, S.: An ensemble framework for clustering protein-protein interaction graphs. Bioinformatics 23(13), 29–40 (2007)

    Article  Google Scholar 

  2. Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Mach. Learn. Res. 7, 1–30 (2006)

    MATH  Google Scholar 

  3. Donges, J.F., Zou, Y., Marwan, N., Kurths, J.: Complex networks in climate dynamics. Eur. Phs. J. Special Topics 174, 157–179 (2009)

    Article  Google Scholar 

  4. Floyd, R.W.: Algorithm 97: Shortest Path. Comm. ACM 5(6), 345 (1962)

    Article  Google Scholar 

  5. Fovell, R.G., Fovell, M.-Y.C.: Climate Zones of the Conterminous United States Defined Using Cluster Analysis. J. Climate 6(11), 2103–2135 (1993)

    Article  Google Scholar 

  6. Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 92, 611–631 (2002)

    Article  MATH  Google Scholar 

  7. Glantz, M.H., Katz, R.W., Nicholls, N.: Teleconnections linking worldwide climate anomalies: scientific basis and societal impact. Cambridge University Press, Cambridge (1991)

    Google Scholar 

  8. Guimerá, R., Mossa, S., Turtschi, A., Amaral, L.A.N.: The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proc. Nat. Acad. Sci. USA 102(22), 7794–7799 (2005)

    Article  MATH  Google Scholar 

  9. Hall, M.A., Smith, L.A.: Feature Selection for Machine Learning: Comparing a Correlation-based Filter Approach to the Wrapper. In: Int’l Florida AI Research Society Conf., pp. 235–239 (1999)

    Google Scholar 

  10. Han, J., Kamber, M., Tung, A.K.H.: Spatial Clustering in Data Mining: A Survey, pp. 1–29. Taylor and Francis, Abington (2001)

    Google Scholar 

  11. Hargrove, W.W., Hoffman, F.M.: Using Multivariate Clustering to Characterize Ecoregion Borders. Comput. Sci. Eng. 1(4), 18–25 (1999)

    Article  Google Scholar 

  12. Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-means clustering algorithm. Applied Statistics (28), 100–108 (1979)

    Article  MATH  Google Scholar 

  13. Jain, A.K., Murty, N.N., Flynn, P.J.: Data clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  14. Kalnay, E., et al.: The NCEP/NCAR 40-Year Reanalysis Project. BAMS 77(3), 437–470 (1996)

    Article  Google Scholar 

  15. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Clustering Analysis. Wiley, Chichester (1990)

    Book  MATH  Google Scholar 

  16. Loveland, T.R., et al.: Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data. Int. J. Remote Sensing 21(6-7), 1303–1330 (2000)

    Article  Google Scholar 

  17. Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2001)

    Google Scholar 

  18. Pons, P., Latapy, M.: Computing communities in large networks using random walks. J. Graph Alg. App. 10(2), 191–218 (2006)

    Article  MATH  Google Scholar 

  19. Race, C., Steinbach, M., Ganguly, A.R., Semazzi, F., Kumar, V.: A Knowledge Discovery Strategy for Relating Sea Surface Temperatures to Frequencies of Tropical Storms and Generating Predictions of Hurricanes Under 21st-century Global Warming Scenarios. In: NASA Conf. on Intelligent Data Understanding, Mountain View, CA (2010)

    Google Scholar 

  20. Ropelewski, C.F., Jones, P.D.: An Extension of the Tahiti-Darwin Southern Oscillation Index. Mon. Weather Rev. 115, 2161–2165 (1987)

    Article  Google Scholar 

  21. Serrano, A., Boguna, M., Vespignani, A.: Extracting the multiscale backbone of complex weighted networks. PNAS 106(16), 8847–8852 (2009)

    Article  Google Scholar 

  22. Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: ACM SIGKDD Workshop on Text Mining (2000)

    Google Scholar 

  23. Steinbach, M., Tan, P.-N., Kumar, V., Klooster, S., Potter, C.: Discovery of Climate Indices using Clustering. In: ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, pp. 446–455 (2003)

    Google Scholar 

  24. Steinhaeuser, K., Chawla, N.V., Ganguly, A.R.: Complex Networks as a Unified Framework for Descriptive Analysis and Predictive Modeling in Climate. Technical Report TR-2010-07. University of Notre Dame (2010)

    Google Scholar 

  25. Steinhaeuser, K., Chawla, N.V., Ganguly, A.R.: Complex Networks in Climate Science: Progress, Opportunities and Challenges. In: NASA Conf. on Intelligent Data Understanding, Mountain View, CA (2010)

    Google Scholar 

  26. Steinhaeuser, K., Chawla, N.V., Ganguly, A.R.: An Exploration of Climate Data Using Complex Networks. ACM SIGKDD Explorations 12(1), 25–32 (2010)

    Article  Google Scholar 

  27. Tsonis, A.A., Roebber, P.J.: The architecture of the climate network. Physica A 333, 497–504 (2004)

    Article  Google Scholar 

  28. Tsonis, A.A., Swanson, K.L., Roebber, P.J.: What Do Networks Have to Do with Climate? BAMS 87(5), 585–595 (2006)

    Article  Google Scholar 

  29. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1994)

    Book  MATH  Google Scholar 

  30. Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)

    Article  Google Scholar 

  31. Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE T. Pattern Anal. 15(11), 1101–1113 (1993)

    Article  Google Scholar 

  32. Yamasaki, K., Gozolchiani, A., Havlin, S.: Climate Networks around the Globe are Significantly Affected by El Niño. Phys. Rev. Lett. 100(22), 157–179 (2008)

    Article  Google Scholar 

  33. http://cdiac.ornl.gov/climate/indices/indices_table.html

  34. http://www.cdc.noaa.gov/data/gridded/data.ncep.reanalysis.html

  35. http://www.cgd.ucar.edu/cas/catalog/climind/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Steinhaeuser, K., Chawla, N.V., Ganguly, A.R. (2011). Comparing Predictive Power in Climate Data: Clustering Matters. In: Pfoser, D., et al. Advances in Spatial and Temporal Databases. SSTD 2011. Lecture Notes in Computer Science, vol 6849. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22922-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22922-0_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22921-3

  • Online ISBN: 978-3-642-22922-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics