Abstract
In this paper we construct a general method for reporting on the accuracy of density estimation. Using variational methods from statistical learning theory we derive a PAC, algorithm-dependent bound on the distance between the data generating distribution and a learned approximation. The distance measure takes the role of a loss function that can be tailored to the learning problem, enabling us to control discrepancies on tasks relevant to subsequent inference. We apply the bound to an efficient mixture learning algorithm. Using the method of localisation we encode properties of both the algorithm and the data generating distribution, producing a tight, empirical, algorithm-dependent upper risk bound on the performance of the learner. We discuss other uses of the bound for arbitrary distributions and model averaging.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Shawe-Taylor, J., Dolia, A.: A framework for probability density estimation. In: ICML (2007)
Song, L., Zhang, X., Smola, A., Gretton, A., Schölkopf, B.: Tailoring density estimation via reproducing kernel moment matching. In: ICML 2008: Proceedings of the 25th international conference on Machine learning, pp. 992–999. ACM, New York (2008)
McAllester, D.A.: PAC-Bayesian model averaging. In: Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pp. 164–170. ACM Press, New York (1999)
Seeger, M.: Pac-Bayesian generalisation error bounds for gaussian process classification. J. Mach. Learn. Res. 3, 233–269 (2003)
Langford, J.: Tutorial on practical prediction theory for classification. J. Mach. Learn. Res. 6, 273–306 (2005)
Audibert, J.Y.: Aggregated estimators and empirical complexity for least square regression. Annales de l’Institut Henri Poincare (B) Probability and Statistics 40(6), 685–736 (2004)
Dalalyan, A., Tsybakov, A.B.: Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Mach. Learn. 72(1-2), 39–61 (2008)
Zhang, T.: Information-theoretic upper and lower bounds for statistical estimation. IEEE Transactions on Information Theory 52(4), 1307–1321 (2006)
Seldin, Y., Tishby, N.: A PAC-Bayesian approach to unsupervised learning with application to co-clustering analysis. Journal of Machine Learning Research, 1–46 (03 2010)
Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: A PAC-Bayes risk bound for general loss functions. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 449–456. MIT Press, Cambridge (2007)
Ralaivola, L., Szafranski, M., Stempfel, G.: Chromatic PAC-Bayes Bounds for Non-IID Data. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics AISTATS 2009. JMLR Workshop and Conference Proceedings, vol. 5, pp. 416–423 (2009)
Lever, G., Laviolette, F., Shawe-Taylor, J.: Distribution dependent PAC-Bayes priors. Technical report, University College London (2010)
Catoni, O.: A PAC-Bayesian approach to adaptive classification. Technical report, Laboratoire de Probabilités et Modéles Aléatoires, Universités Paris 6 and Paris 7 (2003)
Audibert, J.Y.: A better variance control for PAC-Bayesian classification. Technical report, Laboratoire de Probabilités et Modéles Aléatoires, Universités Paris 6 and Paris 7 (2004)
Alquier, P.: PAC-Bayesian bounds for randomized empirical risk minimizers. In: Mathematical Methods of StatisticS (2007)
Catoni, O.: Pac-Bayesian supervised classification: The thermodynamics of statistical learning (2007)
Serfling, R.J.: Approximation Theorems of Mathematical Statistics. John Wiley and Sons, Chichester (1980)
Maurer, A.: A note on the PAC Bayesian theorem (2004)
Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-Bayesian learning of linear classifiers. In: ICML (2009)
Smola, A., Gretton, A., Song, L., Schölkopf, B.: A Hilbert space embedding for distributions. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds.) ALT 2007. LNCS (LNAI), vol. 4754, pp. 13–31. Springer, Heidelberg (2007)
Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Lanckriet, G.R.G., Shoelkopf, B.: Injective Hilbert space embeddings of probability measures. In: COLT, pp. 111–122. Omnipress (2008)
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley Interscience, New York (1991)
Bonnans, J., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer Series in Statistics. Springer, Heidelberg (2000)
Shawe-Taylor, J., Cristianini, N.: Estimating the moments of a random vector. In: Proceedings of GRETSI 2003 Conference, vol. 1, p. 47–52 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Higgs, M., Shawe-Taylor, J. (2010). A PAC-Bayes Bound for Tailored Density Estimation. In: Hutter, M., Stephan, F., Vovk, V., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2010. Lecture Notes in Computer Science(), vol 6331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16108-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-16108-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16107-0
Online ISBN: 978-3-642-16108-7
eBook Packages: Computer ScienceComputer Science (R0)