Abstract
In this paper we analyse the average behaviour of the Bayes-optimal and Gibbs learning algorithms. We do this both for off-training-set error and conventional IID (independent identically distributed) error (for which test sets overlap with training sets). For the IID case we provide a major extension to one of the better known results. We also show that expected IID test set error is a non-increasing function of training set size for either algorithm. On the other hand, as we show, the expected off-training-set error for both learning algorithms can increase with training set size, for non-uniform sampling distributions. We characterize the relationship the sampling distribution must have with the prior for such an increase. We show in particular that for uniform sampling distributions and either algorithm, the expected off-training-set error is a non-increasing function of training set size. For uniform sampling distributions, we also characterize the priors for which the expected error of the Bayes-optimal algorithm stays constant. In addition we show that for the Bayes-optimal algorithm, expected off-training-set error can increase with training set size when the target function is fixed, but if and only if the expected error averaged over all targets decreases with training set size. Our results hold for arbitrary noise and arbitrary loss functions.
Similar content being viewed by others
References
Bayarri, M. J. and Berger, J. O. Applications and limitations of robust Bayesian bounds and type II MLE, Purdue University Department of Statistics TR 93-11C.
Berger, J. (1985) Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag.
Blumer, A. et al.(1987) Occam's Razor. Information Processing Letters, 24, 377-80.
Chaloner, K. and Verdinelli, I. Bayesian experimental design: a review. Minneapolis: University of Minnesota, Department of Statistics.
Cussens, J. A Bayesian analysis of algorithms for learning finite functions. Glasgow Caledonian University, Department of Mathematics.
Dietterich, T. (1990) Annual Review of Computer Science, 4, 255-306.
Gustafson, P. The local Sensitivity of posterior expectations. Carnegie Mellon University, Department of Statistics.
Haussler, D., Kearns, M. and Schapire, R. (1994) Machine Learning, 14, 83-115.
Hill, S. D. and Spall, J. C. (1994) Sensitivity of a Bayesian Analysis to the Prior Distribution. IEEE TSMC 24, 216-221.
Kearns, M. J. et al.(1992) Towards efficient agnostic learning. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, ACM Press. New York.
Tom Mitchell (1994) Quote from Machine Learning course at Carnegie Mellon University Computer Science Department.
Opper, M. and Haussler, D. (1991a) In Proceedings of the 4th Annual Workshop on Computational Learning Theory, pp. 75-87, Morgan Kaufmann.
Opper, M. and Haussler, D. (1991b) Physics Review Letters, 66, 2677-80.
Parzen, E. (1960) Modern Probability Theory and its Applications. New York: Wiley.
Schwartz, D., Samalam, V., Solla, S. and Denker, J. (1990) Exhaustive Learning. Neural Computation, 2, 374-85.
Tishby, N., Levin, E. and Solla, S. (1989) In International Joint Conference on Neural Networks, Vol. II, pp. 403-9 IEEE.
van der Broeck, C. and Kawai, R. (1990) Physics Review A, 42, 6210-18 and in Proceedings of the International AMSE Conference on Neural Networks, San Diego (USA), May 1991, Vol. 1, pp. 151-62
Vapnik, V. (1992) Estimation of Dependences Based on Empirical Data, Springer-Verlag.
Weiss, S. M. and Kulikowski, C. A. (1991) Computer Systems that Learn. San Mateo, CA: Morgan Kauffman.
Wolpert, D. (1994a) In D. Wolpert (ed.) The Mathematics of Generalization, pp. 117-214, Addison-Wesley.
Wolpert, D (1994b) In S. Hanson et al.(eds) Neural Information Processing Systems 6. San Mateo, CA: Morgan-Kauffman.
Wolpert, D. and Lapedes, A. (1994) In D. Wolpert (ed.) The Mathematics of Generalization, pp. 243-78, Addison-Wesley.
Wolpert, D. Off-Training Set Error and a priori distinctions between Learning Algorithms, SFI TR 95-01-003.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
WOLPERT, D.H., KNILL, E. & GROSSMAN, T. Some results concerning off-training-set and IID error for the Gibbs and the Bayes optimal generalizers. Statistics and Computing 8, 35–54 (1998). https://doi.org/10.1023/A:1008867009312
Issue Date:
DOI: https://doi.org/10.1023/A:1008867009312