skip to main content
10.1145/1143844.1143855acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

On Bayesian bounds

Published:25 June 2006Publication History

ABSTRACT

We show that several important Bayesian bounds studied in machine learning, both in the batch as well as the online setting, arise by an application of a simple compression lemma. In particular, we derive (i) PAC-Bayesian bounds in the batch setting, (ii) Bayesian log-loss bounds and (iii) Bayesian bounded-loss bounds in the online setting using the compression lemma. Although every setting has different semantics for prior, posterior and loss, we show that the core bound argument is the same. The paper simplifies our understanding of several important and apparently disparate results, as well as brings to light a powerful tool for developing similar arguments for other methods.

References

  1. Bartlett, P., Collins, M., Taskar, B., & McAllester, D. (2004). Exponentiated gradient algorithms for large-margin structured classification. Proceedings of the 18th Annual Conference on Neural Information Processing Systems.]]Google ScholarGoogle Scholar
  2. Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth, M. K. (1997). How to use expert advice. Journal of the ACM, 44, 427--485.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley-Interscience.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Freund, Y., & Schapire, R. (1999a). Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29, 79--103.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. Freund, Y., & Schapire, R. (1999b). Large margin classification using the perceptron algorithm. Machine Learning Journal, 37, 277--296.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Freund, Y., Schapire, R., Singer, Y., & Warmuth, M. (1997). Using and combining predictors that specialize. Proceedings of the 29th Annual ACM Symposium on the Theory of Computing (pp. 334--343).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119--139.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Grünwald, P. D., & Dawid, A. (2004). Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. Annals of Statistics, 32.]]Google ScholarGoogle ScholarCross RefCross Ref
  9. Haussler, D. (1997). A general minimax result for relative entropy. IEEE Transactions of Information Theory, 43, 1276--1280.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Helmbold, D., & Warmuth, M. (1995). On weak learning. Journal of Computer and System Sciences, 50, 551--573.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jaakkola, T., Meila, M., & Jebara, T. (1998). Maximum entropy discrimination. Proceedings of the 12th Annual Conference on Neural Information Processing Systems.]]Google ScholarGoogle Scholar
  12. Kakade, S. M., & Ng, A. (2004). Online bounds for Bayesian algorithms. Proceedings of the 18th Annual Conference on Neural Information Processing Systems.]]Google ScholarGoogle Scholar
  13. Kakade, S. M., Seeger, M., & Foster, D. (2005). Worst-case bounds for Gaussian process models. Proceedings of the 19th Annual Conference on Neural Information Processing Systems.]]Google ScholarGoogle Scholar
  14. Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1--64.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kivinen, J., & Warmuth, M. K. (1999). Boosting as entropy projection. Proceedings of the 12th Annual Conference on Learning Theory (pp. 134--144).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Langford, J. (2005). Tutorial on practical prediction theory for classification. Journal of Machine Learning Research, 6, 273--306.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Langford, J., & Shawe-Taylor, J. (2002). PAC-Bayes and margins. Proceedings of the 16th Annual Conference on Neural Information Processing Systems.]]Google ScholarGoogle Scholar
  18. Littlestone, N., & Warmuth, M. (1994). The weighted majority algorithm. Information and Computation, 108, 212--261.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Long, P., & Wu, X. (2004). Mistake bounds for maximum entropy discrimination. Proceedings of the 18th Annual Conference on Neural Information Processing Systems.]]Google ScholarGoogle Scholar
  20. McAllester, D. (2003a). PAC-Bayesian model averaging. Machine Learning Journal, 5, 5--21.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. McAllester, D. (2003b). Simplified PAC-Bayesian margin bounds. Proceedings of the 16th Annual Conference on Learning Theory (pp. 203--215).]]Google ScholarGoogle ScholarCross RefCross Ref
  22. Rockafellar, R. T. (1970). Convex Analysis. Princeton Landmarks in Mathematics. Princeton University Press.]]Google ScholarGoogle Scholar
  23. Seeger, M. (2002). PAC-Bayesian generalization bounds for Gaussian processes. Journal of Machine Learning Research, 3, 233--269.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Topsoe, F. (1979). Information theoretical optimization techniques. Kybernetika, 15, 8--27.]]Google ScholarGoogle Scholar
  25. Vovk, V. G. (1995). A game of prediction with expert advice. Proceedings of the 8th Annual Conference on Computational Learning Theory (pp. 51--60). ACM Press, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Williams, D. (1991). Probability with Martingales. Cambridge University Press.]]Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. On Bayesian bounds

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICML '06: Proceedings of the 23rd international conference on Machine learning
        June 2006
        1154 pages
        ISBN:1595933832
        DOI:10.1145/1143844

        Copyright © 2006 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 June 2006

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader