Abstract
Vapnik described the “three main learning problems” of pattern recognition, regression estimation and density estimation. These are defined in terms of the loss functionsLoss function—( used to evaluate performance (0-1 lossLoss@0-1 Loss, squared lossSquared loss, and log lossLog loss, respectively). But there are many other loss functions one could use. In this chapter I will summarise some recent work by me and colleagues studying the theoretical aspects of loss functions. The results elucidate the richness of the set of loss functions and explain some of the implications of their choice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bartlett, P., Jordan, M., McAuliffe, J.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)
Bartlett, P.L., Long, P.M., Williamson, R.C.: Fat-shattering and the learnability of real-valued functions. J. Comput. Syst. Sci. 52(3), 434–452 (1996)
Berger, J.O.: Statistical Decision Theory and Bayesian Analysis. Springer, New York (1985)
Buja, A., Stuetzle, W., Shen, Y.: Loss functions for binary class probability estimation and classification: structure and applications. Technical report, University of Pennsylvania (2005)
Csiszár, I.: Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica 2, 299–318 (1967)
DeGroot, M.H.: Uncertainty, information, and sequential experiments. Ann. Math. Stat. 33(2), 404–419 (1962)
García-García, D., Williamson, R.C.: Divergences and risks for multiclass experiments. In: Conference on Learning Theory (JMLR: W&CP), Edinburgh, vol. 23, pp. 28.1–28.20 (2012)
Hand, D.: Deconstructing statistical questions. J. R. Stat. Soc. A (Stat. Soc.) 157(3), 317–356 (1994)
Hand, D., Vinciotti, V.: Local versus global models for classification problems: fitting models where it matters. Am. Stat. 57(2), 124–131 (2003)
Herbrich, R., Williamson, R.: Algorithmic luckiness. J. Mach. Learn. Res. 3(2), 175–212 (2002)
Kivinen, J., Smola, A., Williamson, R.: Online learning with kernels. IEEE Trans. Signal Proc. 52(8), 2165–2176 (2004)
Lacoste-Julien, S., Huszár, F., Gharamani, Z.: Approximate inference for the loss-calibrated Bayesian. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale (2011)
Lee, W., Bartlett, P., Williamson, R.: The importance of convexity in learning with squared loss. IEEE Trans. Inf. Theory 44(5), 1974–1980 (1998)
Liese, F., Vajda, I.: On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 52(10), 4394–4412 (2006)
Lindley, D.: On a measure of the information provided by an experiment. Ann. Math. Stat. 27(4), 986–1005 (1956)
Reid, M.D., Williamson, R.C.: Surrogate regret bounds for proper losses. In: Proceedings of the International Conference on Machine Learning, Montreal, pp. 897–904 (2009)
Reid, M.D., Williamson, R.C.: Composite binary losses. J. Mach. Learn. Res. 11, 2387–2422 (2010)
Reid, M.D., Williamson, R.C.: Information, divergence and risk for binary experiments. J. Mach. Learn. Res. 12, 731–817 (2011)
Schölkopf, B., Smola, A., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12, 1207–1245 (2000)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Shawe-Taylor, J., Bartlett, P., Williamson, R., Anthony, M.: Structural risk minimization over data-dependent hierarchies. IEEE Trans. Inf. Theory 44(5), 1926–1940 (1998)
van Erven, T., Grünwald, P., Reid, M.D., Williamson, R.C.: Mixability in statistical learning. In: Neural Information Processing Systems, Lake Tahoe (2012)
van Erven, T., Reid, M.D., Williamson, R.C.: Mixability is Bayes risk curvature relative to log loss. J. Mach. Learn. Res. 13, 1639–1663 (2012)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
Vernet, E., Williamson, R.C., Reid, M.D.: Composite multiclass losses. In: Neural Information Processing Systems, Granada (2011)
Vernet, E., Williamson, R.C., Reid, M.D.: Composite multiclass losses. J. Mach. Learn. Res. 42 (2012) (Submitted)
Vovk, V.: A game of prediction with expert advice. In: Proceedings of the 8th Annual Conference on Computational Learning Theory, Santa Cruz, pp. 51–60. ACM (1995)
Williamson, R.C.: Introduction. In: Introductory Talk at Workshop on Relations Between Machine Learning Probems. NIPS2011 workshops, Sierra Nevada (2011)
Williamson, R.C., Smola, A., Schölkopf, B.: Generalization performance of regularization networks and support-vector machines via entropy numbers of compact operators. IEEE Trans. Inf. Theory 47(6), 2516–2532 (2001)
Acknowledgements
This work was supported by the Australian Government through the Australian Research Council and through NICTA, which is co-funded by the Department of Broadband, Communications and the Digital Economy.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Williamson, R.C. (2013). Loss Functions. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-41136-6_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41135-9
Online ISBN: 978-3-642-41136-6
eBook Packages: Computer ScienceComputer Science (R0)