Abstract
This chapter describes how we used regression rules to improve upon results previously published in the Earth science literature. In such a scientific application of machine learning, it is crucially important for the learned models to be understandable and communicable. We recount how we selected a learning algorithm to maximize communicability, and then describe two visualization techniques that we developed to aid in understanding the model by exploiting the spatial nature of the data. We also report how evaluating the learned models across time let us discover an error in the data.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Andrienko, G.L., Andrienko, N.V.: Interactive maps for visual data exploration. International Journal Geographic Information Science 13, 355–374 (1999)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth, Belmont, CA (1984)
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131–167 (1999)
Brunk, C., Kelly, J., Kohavi, R.: MineSet: An integrated system for data mining. In: Proceedings of the Second International Conference of Knowledge Discovery and Data Mining, Portland, OR, pp. 135–138 (1996)
Chen, H.S.: Remote sensing calibration systems: An introduction. A. Deepak Publishing, Hampton, VA (1997)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference of Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231 (1996)
John, G.A.: Robust decision trees: Removing outliers from data. In: Proceedings of the First International Conference of Knowledge Discovery and Data Mining, Montreal, Canada, pp. 174–179 (1995)
Keim, D.A., Kriegel, H.-P.: Visualization techniques for mining large databases: A comparison. Transactions on Knowledge and Data Engineering 8, 923–938 (1996)
Kodratoff, Y., Nédellec, C. (eds.): Working Notes of the IJCAI-95 Workshop on Machine Learning and Comprehensibility, Montreal, Canada (1995)
Lieth, H.: Modeling the primary productivity of the world. In: Lieth, H., Whittaker, R.H. (eds.) Primary Productivity of the Biosphere, pp. 237–263. Springer, Heidelberg (1975)
Michalski, R.S.: A theory and methodology of inductive learning. Artificial Intelligence 20, 111–161 (1983)
Pazzani, M.J., Bay, S.D.: The independent sign bias: gaining insight from multiple linear regression. In: Proceeding of the Twenty-First Annual Meeting of the Cognitive Science Society, Vancouver, Canada (1999)
Potter, C.S., Brooks, V.: Global analysis of empirical relations between annual climate and seasonality of NDVI. International Journal of Remote Sensing 19, 2921–2948 (1998)
Potter, C.S., Klooster, S.A.: Interannual variability in soil trace gas (CO 2, N 2 O, NO) fluxes and analysis of controllers on regional to global scales. Global Biochemical Cycles 12, 621–635 (1998)
Potter, C.S., Klooster, S.A., Brooks, V.: Interannual variability in terrestrial net primary production: Exploration of trends and controls on regional to global scales. Ecosystems 2(1), 36–48 (1999)
Provost, F., Kohavi, R.: On applied research in machine learning. Machine Learning 30, 127–132 (1998)
Quinlan, J.R.: Learning with continuous classes. In: Proceedings of the Australian Joint Conference on Artificial Intelligence, Hobart, Australia, pp. 343–348 (1992)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993)
Rheingans, P., desJardins, M.: Visualizing high-dimensional predictive model quality. In: Proceedings of the Eleventh IEEE Visualization Conference, Salt Lake City, UT, pp. 493–496 (2000)
RuleQuest. RuleQuest Research data mining tools (2002), http://www.rulequest.com
Schwabacher, M., Langley, P.: Discovering communicable scientific knowledge from spatio-temporal data. In: Proceedings of the Eighteenth International Conference on Machine Learning, Stanford, CA, pp. 489–496 (2001)
Smyth, P., Ghil, M., Ide, K.: Multiple regimes in Northern hemisphere height fields via mixture model clustering. Journal of the Atmospheric Sciences 56 (1999)
SPIN!: Spatial mining for data of public interest (2002), http://www.ccg.leeds.ac.uk/spin
Thornthwaite, C.W.: An approach toward rational classification of climate. Geographical Review 38, 55–94 (1948)
Todorovski, L., Dzeroski, S.: Declarative bias in equation discovery. In: Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, pp. 376–384 (1997)
Tufte, E.R.: The visual display of quantitative information. Graphics Press, Cheshire (1983)
Weiss, S., Indurkhya, N.: Rule-based regression. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, Chambéry, France, pp. 1072–1078 (1993)
Willmott, C.J., Feddema, J.J.: A more rational climate moisture index. Professional Geographer 44, 84–87 (1992)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schwabacher, M., Langley, P., Potter, C., Klooster, S., Torregrosa, A. (2007). Discovering Communicable Models from Earth Science Data. In: Džeroski, S., Todorovski, L. (eds) Computational Discovery of Scientific Knowledge. Lecture Notes in Computer Science(), vol 4660. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73920-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-73920-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73919-7
Online ISBN: 978-3-540-73920-3
eBook Packages: Computer ScienceComputer Science (R0)