Effects of Data Grouping on Calibration Measures of Classifiers

Dreiseitl, Stephan; Osl, Melanie

doi:10.1007/978-3-642-27549-4_46

Stephan Dreiseitl¹⁸ &
Melanie Osl¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6927))

Included in the following conference series:

International Conference on Computer Aided Systems Theory

1670 Accesses
1 Citations

Abstract

The calibration of a probabilistic classifier refers to the extend to which its probability estimates match the true class membership probabilities. Measuring the calibration of a classifier usually relies on performing chi-squared goodness-of-fit tests between grouped probabilities and the observations in these groups.

We considered alternatives to the Hosmer-Lemeshow test, the standard chi-squared test with groups based on sorted model outputs. Since this grouping does not represent “natural” groupings in data space, we investigated a chi-squared test with grouping strategies in data space. Using a series of artificial data sets for which the correct models are known, and one real-world data set, we analyzed the performance of the Pigeon-Heyse test with groupings by self-organizing maps, k-means clustering, and random assignment of points to groups. We observed that the Pigeon-Heyse test offers slightly better performance than the Hosmer-Lemeshow test while being able to locate regions of poor calibration in data space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lasko, T., Bhagwat, J., Zhou, K., Ohno-Machado, L.: The use of receiver operating characteristic curves in biomedical informatics. Journal of Biomedical Informatics 38(5), 404–415 (2005)
Article Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 694–699 (2002)
Google Scholar
Hosmer, D., Lemeshow, S.: A goodness-of-fit test for the multiple logistic regression model. Communications in Statistics A10, 1043–1069 (1980)
Article MATH Google Scholar
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression, 2nd edn. Wiley-Interscience Publication, Hoboken (2000)
Book MATH Google Scholar
Xie, X.J., Pendergast, J., Clarke, W.: Increasing the power: A practical approach to goodness-of-fit test for logistic regression models with continuous predictors. Computational Statistics & Data Analysis 52, 2703–2713 (2008)
Article MathSciNet MATH Google Scholar
Bertolini, G., D’Amico, R., Nardi, D., Tinazzi, A., Apolone, G.: One model, several results: the paradox of the hosmerlemeshow goodness-of-fit test for the logistic regression model. Journal of Epidemiology and Biostatistics 5(4), 251–253 (2000)
Google Scholar
Kuss, O.: Global goodness-of-fit tests in logistic regression with sparse data. Statistics in Medicine 21, 3789–3801 (2002)
Article Google Scholar
Hosmer, D., Hosmer, T., Cessie, S.L., Lemeshow, S.: A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine 16(9), 965–980 (1997)
Article MATH Google Scholar
Pigeon, J.G., Heyse, J.F.: An improved goodness of fit statistic for probability prediction models. Biometrical Journal 41(1), 71–82 (1999)
Article MATH Google Scholar
Tiatis, A.: A note on a goodness-of-fit test for the logistic regression model. Biometrika 67(1), 250–251 (1980)
Article Google Scholar
Pigeon, J.G., Heyse, J.F.: A cautionary note about assessing the fit of logistic regression models. Journal of Applied Statistics 26(7), 847–853 (1999)
Article MATH Google Scholar
Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)
Google Scholar
Kennedy, R., Burton, A., Fraser, H., McStay, L., Harrison, R.: Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. European Heart Journal 17(8), 1181–1191 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Software Engineering, Upper Austria University of Applied Sciences, A-4232, Hagenberg, Austria
Stephan Dreiseitl
Division of Biomedical Informatics, University of California, San Diego, La Jolla, California, USA
Melanie Osl

Authors

Stephan Dreiseitl
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Osl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Universitario de Ciencias y Tecnologías Cibernéticas, Universidad de Las Palmas de Gran Canaria, Campus de Tafira, 35017, Las Palmas de Gran Canaria, Spain
Roberto Moreno-Díaz & Alexis Quesada-Arencibia &
Institute of Systems Science, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040, Linz, Austria
Franz Pichler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dreiseitl, S., Osl, M. (2012). Effects of Data Grouping on Calibration Measures of Classifiers. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2011. EUROCAST 2011. Lecture Notes in Computer Science, vol 6927. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27549-4_46

Download citation

DOI: https://doi.org/10.1007/978-3-642-27549-4_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27548-7
Online ISBN: 978-3-642-27549-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics