Some results concerning off-training-set and IID error for the Gibbs and the Bayes optimal generalizers

WOLPERT, DAVID H.; KNILL, EMANUEL; GROSSMAN, TAL

doi:10.1023/A:1008867009312

Some results concerning off-training-set and IID error for the Gibbs and the Bayes optimal generalizers

Published: March 1998

Volume 8, pages 35–54, (1998)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

DAVID H. WOLPERT¹,
EMANUEL KNILL² &
TAL GROSSMAN³

107 Accesses
Explore all metrics

Abstract

In this paper we analyse the average behaviour of the Bayes-optimal and Gibbs learning algorithms. We do this both for off-training-set error and conventional IID (independent identically distributed) error (for which test sets overlap with training sets). For the IID case we provide a major extension to one of the better known results. We also show that expected IID test set error is a non-increasing function of training set size for either algorithm. On the other hand, as we show, the expected off-training-set error for both learning algorithms can increase with training set size, for non-uniform sampling distributions. We characterize the relationship the sampling distribution must have with the prior for such an increase. We show in particular that for uniform sampling distributions and either algorithm, the expected off-training-set error is a non-increasing function of training set size. For uniform sampling distributions, we also characterize the priors for which the expected error of the Bayes-optimal algorithm stays constant. In addition we show that for the Bayes-optimal algorithm, expected off-training-set error can increase with training set size when the target function is fixed, but if and only if the expected error averaged over all targets decreases with training set size. Our results hold for arbitrary noise and arbitrary loss functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from binary labels with instance-dependent noise

Article 22 May 2018

An Information-Theoretic Perspective on Overfitting and Underfitting

Test Set Sizing via Random Matrix Theory

Article 19 February 2024

References

Bayarri, M. J. and Berger, J. O. Applications and limitations of robust Bayesian bounds and type II MLE, Purdue University Department of Statistics TR 93-11C.
Berger, J. (1985) Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag.
Google Scholar
Blumer, A. et al.(1987) Occam's Razor. Information Processing Letters, 24, 377-80.
Google Scholar
Chaloner, K. and Verdinelli, I. Bayesian experimental design: a review. Minneapolis: University of Minnesota, Department of Statistics.
Cussens, J. A Bayesian analysis of algorithms for learning finite functions. Glasgow Caledonian University, Department of Mathematics.
Dietterich, T. (1990) Annual Review of Computer Science, 4, 255-306.
Google Scholar
Gustafson, P. The local Sensitivity of posterior expectations. Carnegie Mellon University, Department of Statistics.
Haussler, D., Kearns, M. and Schapire, R. (1994) Machine Learning, 14, 83-115.
Google Scholar
Hill, S. D. and Spall, J. C. (1994) Sensitivity of a Bayesian Analysis to the Prior Distribution. IEEE TSMC 24, 216-221.
Google Scholar
Kearns, M. J. et al.(1992) Towards efficient agnostic learning. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, ACM Press. New York.
Google Scholar
Tom Mitchell (1994) Quote from Machine Learning course at Carnegie Mellon University Computer Science Department.
Opper, M. and Haussler, D. (1991a) In Proceedings of the 4th Annual Workshop on Computational Learning Theory, pp. 75-87, Morgan Kaufmann.
Opper, M. and Haussler, D. (1991b) Physics Review Letters, 66, 2677-80.
Google Scholar
Parzen, E. (1960) Modern Probability Theory and its Applications. New York: Wiley.
Google Scholar
Schwartz, D., Samalam, V., Solla, S. and Denker, J. (1990) Exhaustive Learning. Neural Computation, 2, 374-85.
Google Scholar
Tishby, N., Levin, E. and Solla, S. (1989) In International Joint Conference on Neural Networks, Vol. II, pp. 403-9 IEEE.
van der Broeck, C. and Kawai, R. (1990) Physics Review A, 42, 6210-18 and in Proceedings of the International AMSE Conference on Neural Networks, San Diego (USA), May 1991, Vol. 1, pp. 151-62
Google Scholar
Vapnik, V. (1992) Estimation of Dependences Based on Empirical Data, Springer-Verlag.
Weiss, S. M. and Kulikowski, C. A. (1991) Computer Systems that Learn. San Mateo, CA: Morgan Kauffman.
Google Scholar
Wolpert, D. (1994a) In D. Wolpert (ed.) The Mathematics of Generalization, pp. 117-214, Addison-Wesley.
Wolpert, D (1994b) In S. Hanson et al.(eds) Neural Information Processing Systems 6. San Mateo, CA: Morgan-Kauffman.
Google Scholar
Wolpert, D. and Lapedes, A. (1994) In D. Wolpert (ed.) The Mathematics of Generalization, pp. 243-78, Addison-Wesley.
Wolpert, D. Off-Training Set Error and a priori distinctions between Learning Algorithms, SFI TR 95-01-003.

Download references

Author information

Authors and Affiliations

NASA Ames Research Centre, Caelum Research, MS 269-2, Moffet Field, CA, 94035, USA email
DAVID H. WOLPERT
CIC-3 Computer Research and Applications, MSB265, LANL, Los Alamos, NM, 87545, USA
EMANUEL KNILL
Theoretical Division and, CNLS, MS B213, LANL, Los Alamos, NM, 87545, USA
TAL GROSSMAN

Authors

DAVID H. WOLPERT
View author publications
You can also search for this author in PubMed Google Scholar
EMANUEL KNILL
View author publications
You can also search for this author in PubMed Google Scholar
TAL GROSSMAN
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

WOLPERT, D.H., KNILL, E. & GROSSMAN, T. Some results concerning off-training-set and IID error for the Gibbs and the Bayes optimal generalizers. Statistics and Computing 8, 35–54 (1998). https://doi.org/10.1023/A:1008867009312

Download citation

Issue Date: March 1998
DOI: https://doi.org/10.1023/A:1008867009312

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Some results concerning off-training-set and IID error for the Gibbs and the Bayes optimal generalizers

Abstract

Access this article

Similar content being viewed by others

Learning from binary labels with instance-dependent noise

An Information-Theoretic Perspective on Overfitting and Underfitting

Test Set Sizing via Random Matrix Theory

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Some results concerning off-training-set and IID error for the Gibbs and the Bayes optimal generalizers

Abstract

Access this article

Similar content being viewed by others

Learning from binary labels with instance-dependent noise

An Information-Theoretic Perspective on Overfitting and Underfitting

Test Set Sizing via Random Matrix Theory

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation