Skip to main content

Effects of Data Grouping on Calibration Measures of Classifiers

  • Conference paper
Computer Aided Systems Theory – EUROCAST 2011 (EUROCAST 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6927))

Included in the following conference series:

Abstract

The calibration of a probabilistic classifier refers to the extend to which its probability estimates match the true class membership probabilities. Measuring the calibration of a classifier usually relies on performing chi-squared goodness-of-fit tests between grouped probabilities and the observations in these groups.

We considered alternatives to the Hosmer-Lemeshow test, the standard chi-squared test with groups based on sorted model outputs. Since this grouping does not represent “natural” groupings in data space, we investigated a chi-squared test with grouping strategies in data space. Using a series of artificial data sets for which the correct models are known, and one real-world data set, we analyzed the performance of the Pigeon-Heyse test with groupings by self-organizing maps, k-means clustering, and random assignment of points to groups. We observed that the Pigeon-Heyse test offers slightly better performance than the Hosmer-Lemeshow test while being able to locate regions of poor calibration in data space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lasko, T., Bhagwat, J., Zhou, K., Ohno-Machado, L.: The use of receiver operating characteristic curves in biomedical informatics. Journal of Biomedical Informatics 38(5), 404–415 (2005)

    Article  Google Scholar 

  2. Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  3. Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 694–699 (2002)

    Google Scholar 

  4. Hosmer, D., Lemeshow, S.: A goodness-of-fit test for the multiple logistic regression model. Communications in Statistics A10, 1043–1069 (1980)

    Article  MATH  Google Scholar 

  5. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression, 2nd edn. Wiley-Interscience Publication, Hoboken (2000)

    Book  MATH  Google Scholar 

  6. Xie, X.J., Pendergast, J., Clarke, W.: Increasing the power: A practical approach to goodness-of-fit test for logistic regression models with continuous predictors. Computational Statistics & Data Analysis 52, 2703–2713 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bertolini, G., D’Amico, R., Nardi, D., Tinazzi, A., Apolone, G.: One model, several results: the paradox of the hosmerlemeshow goodness-of-fit test for the logistic regression model. Journal of Epidemiology and Biostatistics 5(4), 251–253 (2000)

    Google Scholar 

  8. Kuss, O.: Global goodness-of-fit tests in logistic regression with sparse data. Statistics in Medicine 21, 3789–3801 (2002)

    Article  Google Scholar 

  9. Hosmer, D., Hosmer, T., Cessie, S.L., Lemeshow, S.: A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine 16(9), 965–980 (1997)

    Article  MATH  Google Scholar 

  10. Pigeon, J.G., Heyse, J.F.: An improved goodness of fit statistic for probability prediction models. Biometrical Journal 41(1), 71–82 (1999)

    Article  MATH  Google Scholar 

  11. Tiatis, A.: A note on a goodness-of-fit test for the logistic regression model. Biometrika 67(1), 250–251 (1980)

    Article  Google Scholar 

  12. Pigeon, J.G., Heyse, J.F.: A cautionary note about assessing the fit of logistic regression models. Journal of Applied Statistics 26(7), 847–853 (1999)

    Article  MATH  Google Scholar 

  13. Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)

    Google Scholar 

  14. Kennedy, R., Burton, A., Fraser, H., McStay, L., Harrison, R.: Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. European Heart Journal 17(8), 1181–1191 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dreiseitl, S., Osl, M. (2012). Effects of Data Grouping on Calibration Measures of Classifiers. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2011. EUROCAST 2011. Lecture Notes in Computer Science, vol 6927. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27549-4_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27549-4_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27548-7

  • Online ISBN: 978-3-642-27549-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics