Skip to main content

Loss Functions

  • Chapter
  • First Online:
Empirical Inference

Abstract

Vapnik described the “three main learning problems” of pattern recognition, regression estimation and density estimation. These are defined in terms of the loss functionsLoss function—( used to evaluate performance (0-1 lossLoss@0-1 Loss, squared lossSquared loss, and log lossLog loss, respectively). But there are many other loss functions one could use. In this chapter I will summarise some recent work by me and colleagues studying the theoretical aspects of loss functions. The results elucidate the richness of the set of loss functions and explain some of the implications of their choice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bartlett, P., Jordan, M., McAuliffe, J.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bartlett, P.L., Long, P.M., Williamson, R.C.: Fat-shattering and the learnability of real-valued functions. J. Comput. Syst. Sci. 52(3), 434–452 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  3. Berger, J.O.: Statistical Decision Theory and Bayesian Analysis. Springer, New York (1985)

    Book  MATH  Google Scholar 

  4. Buja, A., Stuetzle, W., Shen, Y.: Loss functions for binary class probability estimation and classification: structure and applications. Technical report, University of Pennsylvania (2005)

    Google Scholar 

  5. Csiszár, I.: Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica 2, 299–318 (1967)

    MathSciNet  MATH  Google Scholar 

  6. DeGroot, M.H.: Uncertainty, information, and sequential experiments. Ann. Math. Stat. 33(2), 404–419 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  7. García-García, D., Williamson, R.C.: Divergences and risks for multiclass experiments. In: Conference on Learning Theory (JMLR: W&CP), Edinburgh, vol. 23, pp. 28.1–28.20 (2012)

    Google Scholar 

  8. Hand, D.: Deconstructing statistical questions. J. R. Stat. Soc. A (Stat. Soc.) 157(3), 317–356 (1994)

    Google Scholar 

  9. Hand, D., Vinciotti, V.: Local versus global models for classification problems: fitting models where it matters. Am. Stat. 57(2), 124–131 (2003)

    Article  MathSciNet  Google Scholar 

  10. Herbrich, R., Williamson, R.: Algorithmic luckiness. J. Mach. Learn. Res. 3(2), 175–212 (2002)

    MathSciNet  Google Scholar 

  11. Kivinen, J., Smola, A., Williamson, R.: Online learning with kernels. IEEE Trans. Signal Proc. 52(8), 2165–2176 (2004)

    Article  MathSciNet  Google Scholar 

  12. Lacoste-Julien, S., Huszár, F., Gharamani, Z.: Approximate inference for the loss-calibrated Bayesian. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale (2011)

    Google Scholar 

  13. Lee, W., Bartlett, P., Williamson, R.: The importance of convexity in learning with squared loss. IEEE Trans. Inf. Theory 44(5), 1974–1980 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  14. Liese, F., Vajda, I.: On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 52(10), 4394–4412 (2006)

    Article  MathSciNet  Google Scholar 

  15. Lindley, D.: On a measure of the information provided by an experiment. Ann. Math. Stat. 27(4), 986–1005 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  16. Reid, M.D., Williamson, R.C.: Surrogate regret bounds for proper losses. In: Proceedings of the International Conference on Machine Learning, Montreal, pp. 897–904 (2009)

    Google Scholar 

  17. Reid, M.D., Williamson, R.C.: Composite binary losses. J. Mach. Learn. Res. 11, 2387–2422 (2010)

    MathSciNet  MATH  Google Scholar 

  18. Reid, M.D., Williamson, R.C.: Information, divergence and risk for binary experiments. J. Mach. Learn. Res. 12, 731–817 (2011)

    MathSciNet  Google Scholar 

  19. Schölkopf, B., Smola, A., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12, 1207–1245 (2000)

    Article  Google Scholar 

  20. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)

    Article  MATH  Google Scholar 

  21. Shawe-Taylor, J., Bartlett, P., Williamson, R., Anthony, M.: Structural risk minimization over data-dependent hierarchies. IEEE Trans. Inf. Theory 44(5), 1926–1940 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  22. van Erven, T., Grünwald, P., Reid, M.D., Williamson, R.C.: Mixability in statistical learning. In: Neural Information Processing Systems, Lake Tahoe (2012)

    Google Scholar 

  23. van Erven, T., Reid, M.D., Williamson, R.C.: Mixability is Bayes risk curvature relative to log loss. J. Mach. Learn. Res. 13, 1639–1663 (2012)

    MathSciNet  Google Scholar 

  24. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  25. Vernet, E., Williamson, R.C., Reid, M.D.: Composite multiclass losses. In: Neural Information Processing Systems, Granada (2011)

    Google Scholar 

  26. Vernet, E., Williamson, R.C., Reid, M.D.: Composite multiclass losses. J. Mach. Learn. Res. 42 (2012) (Submitted)

    Google Scholar 

  27. Vovk, V.: A game of prediction with expert advice. In: Proceedings of the 8th Annual Conference on Computational Learning Theory, Santa Cruz, pp. 51–60. ACM (1995)

    Google Scholar 

  28. Williamson, R.C.: Introduction. In: Introductory Talk at Workshop on Relations Between Machine Learning Probems. NIPS2011 workshops, Sierra Nevada (2011)

    Google Scholar 

  29. Williamson, R.C., Smola, A., Schölkopf, B.: Generalization performance of regularization networks and support-vector machines via entropy numbers of compact operators. IEEE Trans. Inf. Theory 47(6), 2516–2532 (2001)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the Australian Government through the Australian Research Council and through NICTA, which is co-funded by the Department of Broadband, Communications and the Digital Economy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert C. Williamson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Williamson, R.C. (2013). Loss Functions. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41136-6_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41135-9

  • Online ISBN: 978-3-642-41136-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics