Article

On Bayesian bounds

Author:
Arindam Banerjee

University of Minnesota, Twin Cities

University of Minnesota, Twin Cities
View Profile

ICML '06: Proceedings of the 23rd international conference on Machine learningJune 2006Pages 81–88https://doi.org/10.1145/1143844.1143855

Published:25 June 2006Publication History

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 81–88

ABSTRACT

We show that several important Bayesian bounds studied in machine learning, both in the batch as well as the online setting, arise by an application of a simple compression lemma. In particular, we derive (i) PAC-Bayesian bounds in the batch setting, (ii) Bayesian log-loss bounds and (iii) Bayesian bounded-loss bounds in the online setting using the compression lemma. Although every setting has different semantics for prior, posterior and loss, we show that the core bound argument is the same. The paper simplifies our understanding of several important and apparently disparate results, as well as brings to light a powerful tool for developing similar arguments for other methods.

References

Bartlett, P., Collins, M., Taskar, B., & McAllester, D. (2004). Exponentiated gradient algorithms for large-margin structured classification. Proceedings of the 18th Annual Conference on Neural Information Processing Systems.]]Google Scholar
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth, M. K. (1997). How to use expert advice. Journal of the ACM, 44, 427--485.]] Google ScholarDigital Library
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley-Interscience.]] Google ScholarDigital Library
Freund, Y., & Schapire, R. (1999a). Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29, 79--103.]]Google ScholarCross Ref
Freund, Y., & Schapire, R. (1999b). Large margin classification using the perceptron algorithm. Machine Learning Journal, 37, 277--296.]] Google ScholarDigital Library
Freund, Y., Schapire, R., Singer, Y., & Warmuth, M. (1997). Using and combining predictors that specialize. Proceedings of the 29th Annual ACM Symposium on the Theory of Computing (pp. 334--343).]] Google ScholarDigital Library
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119--139.]] Google ScholarDigital Library
Grünwald, P. D., & Dawid, A. (2004). Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. Annals of Statistics, 32.]]Google ScholarCross Ref
Haussler, D. (1997). A general minimax result for relative entropy. IEEE Transactions of Information Theory, 43, 1276--1280.]]Google ScholarDigital Library
Helmbold, D., & Warmuth, M. (1995). On weak learning. Journal of Computer and System Sciences, 50, 551--573.]] Google ScholarDigital Library
Jaakkola, T., Meila, M., & Jebara, T. (1998). Maximum entropy discrimination. Proceedings of the 12th Annual Conference on Neural Information Processing Systems.]]Google Scholar
Kakade, S. M., & Ng, A. (2004). Online bounds for Bayesian algorithms. Proceedings of the 18th Annual Conference on Neural Information Processing Systems.]]Google Scholar
Kakade, S. M., Seeger, M., & Foster, D. (2005). Worst-case bounds for Gaussian process models. Proceedings of the 19th Annual Conference on Neural Information Processing Systems.]]Google Scholar
Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1--64.]] Google ScholarDigital Library
Kivinen, J., & Warmuth, M. K. (1999). Boosting as entropy projection. Proceedings of the 12th Annual Conference on Learning Theory (pp. 134--144).]] Google ScholarDigital Library
Langford, J. (2005). Tutorial on practical prediction theory for classification. Journal of Machine Learning Research, 6, 273--306.]] Google ScholarDigital Library
Langford, J., & Shawe-Taylor, J. (2002). PAC-Bayes and margins. Proceedings of the 16th Annual Conference on Neural Information Processing Systems.]]Google Scholar
Littlestone, N., & Warmuth, M. (1994). The weighted majority algorithm. Information and Computation, 108, 212--261.]] Google ScholarDigital Library
Long, P., & Wu, X. (2004). Mistake bounds for maximum entropy discrimination. Proceedings of the 18th Annual Conference on Neural Information Processing Systems.]]Google Scholar
McAllester, D. (2003a). PAC-Bayesian model averaging. Machine Learning Journal, 5, 5--21.]] Google ScholarDigital Library
McAllester, D. (2003b). Simplified PAC-Bayesian margin bounds. Proceedings of the 16th Annual Conference on Learning Theory (pp. 203--215).]]Google ScholarCross Ref
Rockafellar, R. T. (1970). Convex Analysis. Princeton Landmarks in Mathematics. Princeton University Press.]]Google Scholar
Seeger, M. (2002). PAC-Bayesian generalization bounds for Gaussian processes. Journal of Machine Learning Research, 3, 233--269.]] Google ScholarDigital Library
Topsoe, F. (1979). Information theoretical optimization techniques. Kybernetika, 15, 8--27.]]Google Scholar
Vovk, V. G. (1995). A game of prediction with expert advice. Proceedings of the 8th Annual Conference on Computational Learning Theory (pp. 51--60). ACM Press, New York, NY.]] Google ScholarDigital Library
Williams, D. (1991). Probability with Martingales. Cambridge University Press.]]Google ScholarCross Ref

Index Terms

On Bayesian bounds
1. Computing methodologies
  1. Machine learning
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Statistical graphics

Recommendations

A Fresh Look at the Bayesian Bounds of the Weiss-Weinstein Family

Minimal bounds on the mean square error (MSE) are generally used in order to predict the best achievable performance of an estimator for a given observation model. In this paper, we are interested in the Bayesian bound of the Weiss-Weinstein family. ...
Read More
Data-dependent bounds for Bayesian mixture methods
NIPS'02: Proceedings of the 15th International Conference on Neural Information Processing Systems

We consider Bayesian mixture approaches, where a predictor is constructed by forming a weighted average of hypotheses from some space of functions. While such procedures are known to lead to optimal predictors in several cases, where sufficiently ...
Read More
Tightening bounds for Bayesian network structure learning
AAAI'14: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence

A recent breadth-first branch and bound algorithm (BFBnB) for learning Bayesian network structures (Malone et al. 2011) uses two bounds to prune the search space for better efficiency; one is a lower bound calculated from pattern database heuristics, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Program Chairs:
William Cohen,
Andrew Moore
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 397
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On Bayesian bounds

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Fresh Look at the Bayesian Bounds of the Weiss-Weinstein Family

Data-dependent bounds for Bayesian mixture methods

Tightening bounds for Bayesian network structure learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On Bayesian bounds

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Fresh Look at the Bayesian Bounds of the Weiss-Weinstein Family

Data-dependent bounds for Bayesian mixture methods

Tightening bounds for Bayesian network structure learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media