Abstract
This paper investigates a number of techniques for calibration of the output of a Support Vector Machine in order to provide a posterior probability P(target class | instance).
Five basic calibration techniques are combined with five ways of correcting the SVM scores on the training set. The calibration techniques used are addition of a simple ramp function, allocation of a Gaussian density, fitting of a sigmoid to the output and two binning techniques. The correction techniques include three methods that are based on recent theoretical advances in leave-one-out estimators and two that are variants of hold-out validation set. This leads us to thirty different settings (including calibration on uncorrected scores). All thirty methods are evaluated for two linear SVMs (one with linear and one with quadratic penalty) and for the ridge regression model (regularisation network) on three categories of the Reuters Newswires benchmark and the WebKB dataset. The performance of these methods are compared to both the probabilities generated by a naive Bayes classifier as well as a calibrated centroid classifier.
The main conclusions of this research are: (i) simple calibrators such as ramp and sigmoids perform remarkably well, (ii) score correctors using leave-one-out techniques can perform better than those using validation sets, however, cross-validation methods allow more reliable estimation of test error from the training data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bauer, B., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36, 105–142 (1999)
Corte, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Drish, J.: Obtaining calibrated probability estimates from support vector machines
Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Computation 7(2), 219–269 (1995)
Hand, D.J.: Construction and Assessment of Classification Rules. John Wiley and Sons, Chichester (1982)
Jaakkola, T., Haussler, D.: Probabilistic kernel regression models. In: Proc. of the 1999 Conference on AI and Statistics (1999)
Joachims, T.: Estimating the Generalization Performance of an SVM Efficiently. In: Langley, P. (ed.) Seventh International Conference on Machine Learning, pp. 431–438. Morgan Kaufmann, San Francisco (2000)
Kimeldorf, G., Whaba, G.: A correspondence between Bayesian estimation of stochastic processes and smoothing by splines. Ann. Math. Statist. 41, 495–502 (1970)
Kowalczyk, A., Raskutti, B.: Exploring Fringe Settings of SVMs for Classification. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 278–290. Springer, Heidelberg (2003)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1996)
Nigam, K., Mccallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Opper, M., Winther, O.: Gaussian process classification and svm: Mean field results and leave-one out estimator. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 301–316. MIT Press, Cambridge (2000)
Platt, J.: Probabilities for SV Machines. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (2000)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2001)
Smyth, P., Gray, A., Fayyad, U.: Retrofitting decision tree classifiers usinf kernel density estimation. In: Proceedings of the 12th International Conference on Machine Learning, pp. 506–514 (1995)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive bayesian classiers. In: Proceedings of the Eighteenth International Conference on Machine Learning ICML 2001 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kowalczyk, A., Raskutti, B., Ferrá, H. (2004). Exploring Potential of Leave-One-Out Estimator for Calibration of SVM in Text Mining. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-24775-3_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive