Abstract
Classifier calibration is concerned with the scale on which a classifier’s scores are expressed. While a classifier ultimately maps instances to discrete classes, it is often beneficial to decompose this mapping into a scoring classifier which outputs one or more real-valued numbers and a decision rule which converts these numbers into predicted classes. For example, a linear classifier might output a positive or negative score whose magnitude is proportional to the distance between the instance and the decision boundary, in which case the decision rule would be a simple threshold on that score. The advantage of calibrating these scores to a known, domain-independent scale is that the decision rule then also takes a domain-independent form and does not have to be learned. The best-known example of this occurs when the classifier’s scores approximate, in a precise sense, the posterior probability over the classes; the main advantage of this is that the optimal decision rule is to predict the class that minimizes expected cost averaged over all possible true classes. The main methods to obtain calibrated scores are logistic calibration, which is a parametric method that assumes that the distances on either side of the decision boundary are normally distributed and a nonparametric alternative that is variously known as isotonic regression, the pool adjacent violators (PAV) method or the ROC convex hull (ROCCH) method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination. Appl Intell 38(4):566–585
Brier G (1950) Verification of forecasts expressed in terms of probabilities. Mon Weather Rev 78:1–3
Drummond C, Holte R (2006) Cost curves: an improved method for visualizing classifier performance. Mach Learn 65(1):95–130
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of 17th international joint conference on artificial intelligence (IJCAI’01). Morgan Kaufmann, pp 973–978
Fawcett T, Niculescu-Mizil A (2007) PAV and the ROC convex hull. Mach Learn 68(1):97–106
Ferri C, Flach P, Hernández-Orallo J (2003) Improving the AUC of probabilistic estimation trees. In: 14th European conference on machine learning (ECML’03). Springer, pp 121–132
Flach P, Kull M (2015) Precision-recall-gain curves: PR analysis done right. In: Advances in neural information processing systems (NIPS’15), pp 838–846
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102(477):359–378
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26(2):451–471
Hernández-Orallo J, Flach P, Ferri C (2011) Brier curves: a new cost-based visualisation of classifier performance. In: Proceedings 28th international conference on machine learning (ICML’11), pp 585–592
Hernández-Orallo J, Flach P, Ferri C (2012) A unified view of performance metrics: translating threshold choice into expected classification loss. J Mach Learn Res 13(1):2813–2869
Kong EB, Dietterich T (1997) Probability estimation via error-correcting output coding. In: International conference on artificial intelligence and soft computing
Koyejo OO, Natarajan N, Ravikumar PK, Dhillon IS (2014) Consistent binary classification with generalized performance metrics. In: Advances in neural information processing systems (NIPS’14), pp 2744–2752
Kull M, Flach P (2015) Novel decompositions of proper scoring rules for classification: score adjustment as precursor to calibration. In: Machine learning and knowledge discovery in databases (ECML-PKDD’15). Springer, pp 68–85
Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. In: Proceedings of 22nd international conference on machine learning (ICML’05), pp 625–632
Platt J (2000) Probabilities for SV machines. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 61–74
Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3):199–215
Zadrozny B (2001) Reducing multiclass to binary by coupling probability estimates. In: Advances in neural information processing systems (NIPS’01), pp 1041–1048
Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proceedings of 18th international conference on machine learning (ICML’01), pp 609–616
Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of 8th international conference on knowledge discovery and data mining (KDD’02). ACM, pp 694–699
Zhao M-J, Edakunni N, Pocock A, Brown G (2013) Beyond Fano’s inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications. J Mach Learn Res 14(1):1033–1090
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Flach, P.A. (2017). Classifier Calibration. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_900
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_900
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering