Generalization Bounds

Reid, Mark

doi:10.1007/978-1-4899-7687-1_328

Mark Reid³

80 Accesses

Synonyms

Inequalities; Sample complexity

Definition

In the theory of statistical machine learning, a generalization bound – or, more precisely, a generalization error bound – is a statement about the predictive performance of a learning algorithm or class of algorithms. Here, a learning algorithm is viewed as a procedure that takes some finite training sample of labeled instances as input and returns a hypothesis regarding the labels of all instances, including those which may not have appeared in the training sample. Assuming labeled instances are drawn from some fixed distribution, the quality of a hypothesis can be measured in terms of its risk – its incompatibility with the distribution. The performance of a learning algorithm can then be expressed in terms of the expected risk of its hypotheses given randomly generated training samples.

Under these assumptions, a generalization bound is a theorem, which holds for any distribution and states that, with high probability, applying...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 699.99; Price excludes VAT (USA)

Hardcover Book: USD 949.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
As mentioned above, the uniform convergence bounds by Vapnik and Chervonenkis (1971) and the PAC framework of Valiant (1984) were the first generalization bounds for statistical learning. Ideas from both were synthesized and extended by Blumer et al. (1989). The book by Kearns and Vazirani (1994) provides a good overview of the early PAC-style bounds while Vapnik’s comprehensive book (Vapnik 1998), and Anthony and Bartlett’s book (1999) cover classification and regression bounds involving the VC dimension. Rademacher averages were first considered as an alternative to VC dimension in the context of learning theory by Koltchinskii and Panchenko (2001) and were refined and extended by Bartlett and Mendelson (2003) who provide a readable overview. Early PAC-Bayesian bounds were established by McAllester (1999) based on an earlier PAC analysis of Bayesian estimators by Shawe-Taylor and Williamson (1997). Applications of the PAC-Bayesian bound to SVMs are discussed in Langford’s tutorial on prediction theory (Langford 2005) and recent paper by Banerjee (2006) provides an information theoretic motivation, a simple proof of the bound in (11), as well as connections with similar bounds in online learning. There are several well-written surveys of generalization bounds and learning theory in general. Herbrich and Williamson (2002) present a unified view of VC, compression, luckiness, PAC-Bayesian, and stability bounds. In a very readable introduction to statistical learning theory, Bousquet et al. (2004) provide good intuition and concise proofs for all but the PAC-Bayesian bounds presented above. That introduction is a good companion for the excellent but more technical survey by Boucheron et al. (2005) based on tools from the theory of empirical processes. The latter paper also provides a wealth of further references and a concise history of the development of main techniques in statistical learning theory.

Recommended Reading

As mentioned above, the uniform convergence bounds by Vapnik and Chervonenkis (1971) and the PAC framework of Valiant (1984) were the first generalization bounds for statistical learning. Ideas from both were synthesized and extended by Blumer et al. (1989). The book by Kearns and Vazirani (1994) provides a good overview of the early PAC-style bounds while Vapnik’s comprehensive book (Vapnik 1998), and Anthony and Bartlett’s book (1999) cover classification and regression bounds involving the VC dimension. Rademacher averages were first considered as an alternative to VC dimension in the context of learning theory by Koltchinskii and Panchenko (2001) and were refined and extended by Bartlett and Mendelson (2003) who provide a readable overview. Early PAC-Bayesian bounds were established by McAllester (1999) based on an earlier PAC analysis of Bayesian estimators by Shawe-Taylor and Williamson (1997). Applications of the PAC-Bayesian bound to SVMs are discussed in Langford’s tutorial on prediction theory (Langford 2005) and recent paper by Banerjee (2006) provides an information theoretic motivation, a simple proof of the bound in (11), as well as connections with similar bounds in online learning. There are several well-written surveys of generalization bounds and learning theory in general. Herbrich and Williamson (2002) present a unified view of VC, compression, luckiness, PAC-Bayesian, and stability bounds. In a very readable introduction to statistical learning theory, Bousquet et al. (2004) provide good intuition and concise proofs for all but the PAC-Bayesian bounds presented above. That introduction is a good companion for the excellent but more technical survey by Boucheron et al. (2005) based on tools from the theory of empirical processes. The latter paper also provides a wealth of further references and a concise history of the development of main techniques in statistical learning theory.

Anthony M, Bartlett PL (1999) Neural network learning: theoretical foundations. Cambridge University Press, Cambridge
Book MATH Google Scholar
Banerjee A (2006) On Bayesian bounds. In: ICML’06: proceedings of the 23rd international conference on machine learning, Pittsburgh, pp 81–88
Google Scholar
Bartlett PL, Mendelson S (2003) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482
MathSciNet MATH Google Scholar
Beygelzimer A, Langford J, Zadrozny B (2008) Machine learning techniques – reductions between prediction quality metrics. In: Zhen L, Cathy HX (eds) Performance modeling and engineering. Springer, New York, pp 3–28
Chapter Google Scholar
Blumer A, Ehrenfeucht A, Haussler D, Warmuth MK (1989) Learnability and the Vapnik-Chervonenkis dimension. J ACM (JACM) 36(4):929–965
Article MathSciNet MATH Google Scholar
Boucheron S, Bousquet O, Lugosi G (2005) Theory of classification: a survey of some recent advances. ESAIM Probab Stat 9:323–375
Article MathSciNet MATH Google Scholar
Bousquet O, Boucheron S, Lugosi G (2004) Introduction to statistical learning theory. Volume 3176 of lecture notes in artificial intelligence. Springer, Berlin, pp 169–207
Google Scholar
Herbrich R, Williamson RC (2002) Learning and generalization: theory and bounds. In: Arbib M (ed) Handbook of brain theory and neural networks, 2nd ed. MIT Press, Cambridge
Google Scholar
Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge
Google Scholar
Koltchinskii V (2001) Rademacher penalties and structural risk minimization. IEEE Trans Inf Theory 47(5):1902–1914
Article MathSciNet MATH Google Scholar
Langford J (2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6(1):273–306
MathSciNet MATH Google Scholar
McAllester DA (1999) Some PAC-Bayesian theorems. Mach Learn 37(3):355–363
Article MATH Google Scholar
Shawe-Taylor J, Williamson RC (1997) A PAC analysis of a Bayesian estimator. In: Proceedings of the tenth annual conference on computational learning theory. ACM, New York, p 7
Google Scholar
Valiant LG (1984) A theory of the learnable. Commun ACM 27(11):1142
Article MATH Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Vapnik VN, Chervonenkis AY (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2):264–280
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

The Australian National University, Canberra, ACT, Australia
Mark Reid

Authors

Mark Reid
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Claude Sammut
Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Reid, M. (2017). Generalization Bounds. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_328

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7687-1_328
Published: 14 April 2017
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics