Skip to main content

Generalization Bounds

  • Reference work entry
  • First Online:
Encyclopedia of Machine Learning and Data Mining
  • 80 Accesses

Synonyms

Inequalities; Sample complexity

Definition

In the theory of statistical machine learning, a generalization bound â€“ or, more precisely, a generalization error bound â€“ is a statement about the predictive performance of a learning algorithm or class of algorithms. Here, a learning algorithm is viewed as a procedure that takes some finite training sample of labeled instances as input and returns a hypothesis regarding the labels of all instances, including those which may not have appeared in the training sample. Assuming labeled instances are drawn from some fixed distribution, the quality of a hypothesis can be measured in terms of its risk â€“ its incompatibility with the distribution. The performance of a learning algorithm can then be expressed in terms of the expected risk of its hypotheses given randomly generated training samples.

Under these assumptions, a generalization bound is a theorem, which holds for any distribution and states that, with high probability, applying...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 699.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    As mentioned above, the uniform convergence bounds by Vapnik and Chervonenkis (1971) and the PAC framework of Valiant (1984) were the first generalization bounds for statistical learning. Ideas from both were synthesized and extended by Blumer et al. (1989). The book by Kearns and Vazirani (1994) provides a good overview of the early PAC-style bounds while Vapnik’s comprehensive book (Vapnik 1998), and Anthony and Bartlett’s book (1999) cover classification and regression bounds involving the VC dimension. Rademacher averages were first considered as an alternative to VC dimension in the context of learning theory by Koltchinskii and Panchenko (2001) and were refined and extended by Bartlett and Mendelson (2003) who provide a readable overview. Early PAC-Bayesian bounds were established by McAllester (1999) based on an earlier PAC analysis of Bayesian estimators by Shawe-Taylor and Williamson (1997). Applications of the PAC-Bayesian bound to SVMs are discussed in Langford’s tutorial on prediction theory (Langford 2005) and recent paper by Banerjee (2006) provides an information theoretic motivation, a simple proof of the bound in (11), as well as connections with similar bounds in online learning. There are several well-written surveys of generalization bounds and learning theory in general. Herbrich and Williamson (2002) present a unified view of VC, compression, luckiness, PAC-Bayesian, and stability bounds. In a very readable introduction to statistical learning theory, Bousquet et al. (2004) provide good intuition and concise proofs for all but the PAC-Bayesian bounds presented above. That introduction is a good companion for the excellent but more technical survey by Boucheron et al. (2005) based on tools from the theory of empirical processes. The latter paper also provides a wealth of further references and a concise history of the development of main techniques in statistical learning theory.

Recommended Reading

As mentioned above, the uniform convergence bounds by Vapnik and Chervonenkis (1971) and the PAC framework of Valiant (1984) were the first generalization bounds for statistical learning. Ideas from both were synthesized and extended by Blumer et al. (1989). The book by Kearns and Vazirani (1994) provides a good overview of the early PAC-style bounds while Vapnik’s comprehensive book (Vapnik 1998), and Anthony and Bartlett’s book (1999) cover classification and regression bounds involving the VC dimension. Rademacher averages were first considered as an alternative to VC dimension in the context of learning theory by Koltchinskii and Panchenko (2001) and were refined and extended by Bartlett and Mendelson (2003) who provide a readable overview. Early PAC-Bayesian bounds were established by McAllester (1999) based on an earlier PAC analysis of Bayesian estimators by Shawe-Taylor and Williamson (1997). Applications of the PAC-Bayesian bound to SVMs are discussed in Langford’s tutorial on prediction theory (Langford 2005) and recent paper by Banerjee (2006) provides an information theoretic motivation, a simple proof of the bound in (11), as well as connections with similar bounds in online learning. There are several well-written surveys of generalization bounds and learning theory in general. Herbrich and Williamson (2002) present a unified view of VC, compression, luckiness, PAC-Bayesian, and stability bounds. In a very readable introduction to statistical learning theory, Bousquet et al. (2004) provide good intuition and concise proofs for all but the PAC-Bayesian bounds presented above. That introduction is a good companion for the excellent but more technical survey by Boucheron et al. (2005) based on tools from the theory of empirical processes. The latter paper also provides a wealth of further references and a concise history of the development of main techniques in statistical learning theory.

  • Anthony M, Bartlett PL (1999) Neural network learning: theoretical foundations. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Banerjee A (2006) On Bayesian bounds. In: ICML’06: proceedings of the 23rd international conference on machine learning, Pittsburgh, pp 81–88

    Google Scholar 

  • Bartlett PL, Mendelson S (2003) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482

    MathSciNet  MATH  Google Scholar 

  • Beygelzimer A, Langford J, Zadrozny B (2008) Machine learning techniques – reductions between prediction quality metrics. In: Zhen L, Cathy HX (eds) Performance modeling and engineering. Springer, New York, pp 3–28

    Chapter  Google Scholar 

  • Blumer A, Ehrenfeucht A, Haussler D, Warmuth MK (1989) Learnability and the Vapnik-Chervonenkis dimension. J ACM (JACM) 36(4):929–965

    Article  MathSciNet  MATH  Google Scholar 

  • Boucheron S, Bousquet O, Lugosi G (2005) Theory of classification: a survey of some recent advances. ESAIM Probab Stat 9:323–375

    Article  MathSciNet  MATH  Google Scholar 

  • Bousquet O, Boucheron S, Lugosi G (2004) Introduction to statistical learning theory. Volume 3176 of lecture notes in artificial intelligence. Springer, Berlin, pp 169–207

    Google Scholar 

  • Herbrich R, Williamson RC (2002) Learning and generalization: theory and bounds. In: Arbib M (ed) Handbook of brain theory and neural networks, 2nd ed. MIT Press, Cambridge

    Google Scholar 

  • Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge

    Google Scholar 

  • Koltchinskii V (2001) Rademacher penalties and structural risk minimization. IEEE Trans Inf Theory 47(5):1902–1914

    Article  MathSciNet  MATH  Google Scholar 

  • Langford J (2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6(1):273–306

    MathSciNet  MATH  Google Scholar 

  • McAllester DA (1999) Some PAC-Bayesian theorems. Mach Learn 37(3):355–363

    Article  MATH  Google Scholar 

  • Shawe-Taylor J, Williamson RC (1997) A PAC analysis of a Bayesian estimator. In: Proceedings of the tenth annual conference on computational learning theory. ACM, New York, p 7

    Google Scholar 

  • Valiant LG (1984) A theory of the learnable. Commun ACM 27(11):1142

    Article  MATH  Google Scholar 

  • Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  • Vapnik VN, Chervonenkis AY (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2):264–280

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this entry

Cite this entry

Reid, M. (2017). Generalization Bounds. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_328

Download citation

Publish with us

Policies and ethics