Loss Functions

Williamson, Robert C.

doi:10.1007/978-3-642-41136-6_8

Robert C. Williamson⁴

3932 Accesses
1 Citations

Abstract

Vapnik described the “three main learning problems” of pattern recognition, regression estimation and density estimation. These are defined in terms of the loss functionsLoss function—( used to evaluate performance (0-1 lossLoss@0-1 Loss, squared lossSquared loss, and log lossLog loss, respectively). But there are many other loss functions one could use. In this chapter I will summarise some recent work by me and colleagues studying the theoretical aspects of loss functions. The results elucidate the richness of the set of loss functions and explain some of the implications of their choice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bartlett, P., Jordan, M., McAuliffe, J.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)
Article MathSciNet MATH Google Scholar
Bartlett, P.L., Long, P.M., Williamson, R.C.: Fat-shattering and the learnability of real-valued functions. J. Comput. Syst. Sci. 52(3), 434–452 (1996)
Article MathSciNet MATH Google Scholar
Berger, J.O.: Statistical Decision Theory and Bayesian Analysis. Springer, New York (1985)
Book MATH Google Scholar
Buja, A., Stuetzle, W., Shen, Y.: Loss functions for binary class probability estimation and classification: structure and applications. Technical report, University of Pennsylvania (2005)
Google Scholar
Csiszár, I.: Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica 2, 299–318 (1967)
MathSciNet MATH Google Scholar
DeGroot, M.H.: Uncertainty, information, and sequential experiments. Ann. Math. Stat. 33(2), 404–419 (1962)
Article MathSciNet MATH Google Scholar
García-García, D., Williamson, R.C.: Divergences and risks for multiclass experiments. In: Conference on Learning Theory (JMLR: W&CP), Edinburgh, vol. 23, pp. 28.1–28.20 (2012)
Google Scholar
Hand, D.: Deconstructing statistical questions. J. R. Stat. Soc. A (Stat. Soc.) 157(3), 317–356 (1994)
Google Scholar
Hand, D., Vinciotti, V.: Local versus global models for classification problems: fitting models where it matters. Am. Stat. 57(2), 124–131 (2003)
Article MathSciNet Google Scholar
Herbrich, R., Williamson, R.: Algorithmic luckiness. J. Mach. Learn. Res. 3(2), 175–212 (2002)
MathSciNet Google Scholar
Kivinen, J., Smola, A., Williamson, R.: Online learning with kernels. IEEE Trans. Signal Proc. 52(8), 2165–2176 (2004)
Article MathSciNet Google Scholar
Lacoste-Julien, S., Huszár, F., Gharamani, Z.: Approximate inference for the loss-calibrated Bayesian. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale (2011)
Google Scholar
Lee, W., Bartlett, P., Williamson, R.: The importance of convexity in learning with squared loss. IEEE Trans. Inf. Theory 44(5), 1974–1980 (1998)
Article MathSciNet MATH Google Scholar
Liese, F., Vajda, I.: On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 52(10), 4394–4412 (2006)
Article MathSciNet Google Scholar
Lindley, D.: On a measure of the information provided by an experiment. Ann. Math. Stat. 27(4), 986–1005 (1956)
Article MathSciNet MATH Google Scholar
Reid, M.D., Williamson, R.C.: Surrogate regret bounds for proper losses. In: Proceedings of the International Conference on Machine Learning, Montreal, pp. 897–904 (2009)
Google Scholar
Reid, M.D., Williamson, R.C.: Composite binary losses. J. Mach. Learn. Res. 11, 2387–2422 (2010)
MathSciNet MATH Google Scholar
Reid, M.D., Williamson, R.C.: Information, divergence and risk for binary experiments. J. Mach. Learn. Res. 12, 731–817 (2011)
MathSciNet Google Scholar
Schölkopf, B., Smola, A., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12, 1207–1245 (2000)
Article Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article MATH Google Scholar
Shawe-Taylor, J., Bartlett, P., Williamson, R., Anthony, M.: Structural risk minimization over data-dependent hierarchies. IEEE Trans. Inf. Theory 44(5), 1926–1940 (1998)
Article MathSciNet MATH Google Scholar
van Erven, T., Grünwald, P., Reid, M.D., Williamson, R.C.: Mixability in statistical learning. In: Neural Information Processing Systems, Lake Tahoe (2012)
Google Scholar
van Erven, T., Reid, M.D., Williamson, R.C.: Mixability is Bayes risk curvature relative to log loss. J. Mach. Learn. Res. 13, 1639–1663 (2012)
MathSciNet Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Vernet, E., Williamson, R.C., Reid, M.D.: Composite multiclass losses. In: Neural Information Processing Systems, Granada (2011)
Google Scholar
Vernet, E., Williamson, R.C., Reid, M.D.: Composite multiclass losses. J. Mach. Learn. Res. 42 (2012) (Submitted)
Google Scholar
Vovk, V.: A game of prediction with expert advice. In: Proceedings of the 8th Annual Conference on Computational Learning Theory, Santa Cruz, pp. 51–60. ACM (1995)
Google Scholar
Williamson, R.C.: Introduction. In: Introductory Talk at Workshop on Relations Between Machine Learning Probems. NIPS2011 workshops, Sierra Nevada (2011)
Google Scholar
Williamson, R.C., Smola, A., Schölkopf, B.: Generalization performance of regularization networks and support-vector machines via entropy numbers of compact operators. IEEE Trans. Inf. Theory 47(6), 2516–2532 (2001)
Article MATH Google Scholar

Download references

Acknowledgements

This work was supported by the Australian Government through the Australian Research Council and through NICTA, which is co-funded by the Department of Broadband, Communications and the Digital Economy.

Author information

Authors and Affiliations

Australian National University and NICTA, Canberra, ACT, 0200, Australia
Robert C. Williamson

Authors

Robert C. Williamson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert C. Williamson .

Editor information

Editors and Affiliations

Max Planck Institute for Intelligent Systems, Tübingen, Germany
Bernhard Schölkopf
Dept. of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
Zhiyuan Luo
Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
Vladimir Vovk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Williamson, R.C. (2013). Loss Functions. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-41136-6_8
Published: 09 October 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41135-9
Online ISBN: 978-3-642-41136-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics