Abstract
We propose a sequential randomized algorithm, which at each step concentrates on functions having both low risk and low variance with respect to the previous step prediction function. It satisfies a simple risk bound, which is sharp to the extent that the standard statistical learning approach, based on supremum of empirical processes, does not lead to algorithms with such a tight guarantee on its efficiency. Our generalization error bounds complement the pioneering work of Cesa-Bianchi et al. [12] in which standard-style statistical results were recovered with tight constants using worst-case analysis.
A nice feature of our analysis of the randomized estimator is to put forward the links between the probabilistic and worst-case viewpoint. It also allows to recover recent model selection results due to Juditsky et al. [16] and to improve them in least square regression with heavy noise, i.e. when no exponential moment condition is assumed on the output.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alquier, P.: Iterative feature selection in least square regression estimation. Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7 (2005)
Audibert, J.-Y.: Aggregated estimators and empirical complexity for least square regression. Ann. Inst. Henri Poincaré, Probab. Stat. 40(6), 685–736 (2004)
Audibert, J.-Y.: A better variance control for PAC-Bayesian classification. Preprint n.905, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7 (2004), http://www.proba.jussieu.fr/mathdoc/preprints/index.html
Audibert, J.-Y.: PAC-Bayesian statistical learning theory. PhD thesis, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7 (2004)
Barron, A.: Are bayes rules consistent in information? In: Cover, T.M., Gopinath, B. (eds.) Open Problems in Communication and Computation, pp. 85–91. Springer, Heidelberg (1987)
Barron, A., Yang, Y.: Information-theoretic determination of minimax rates of convergence. Ann. Stat. 27(5), 1564–1599 (1999)
Bunea, F., Nobel, A.: Sequential procedures for aggregating arbitrary estimators of a conditional mean, Technical report (2005), available from: http://stat.fsu.edu/%7Eflori/ps/bnapril2005IEEE.pdf
Catoni, O.: Statistical Learning Theory and Stochastic Optimization: Ecole d’été de Probabilités de Saint-Flour XXXI. Lecture Notes in Mathematics. Springer, Heidelberg (2001)
Catoni, O.: A mixture approach to universal model selection. preprint LMENS 97-30 (1997), available from: http://www.dma.ens.fr/edition/preprints/Index.97.html
Catoni, O.: Universal aggregation rules with exact bias bound. Preprint n.510 (1999), http://www.proba.jussieu.fr/mathdoc/preprints/index.html#1999
Catoni, O.: A PAC-Bayesian approach to adaptive classification. Preprint n.840, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7 (2003)
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D.P., Schapire, R.E., Warmuth, M.K.: How to use expert advice. J. ACM 44(3), 427–485 (1997)
Cesa-Bianchi, N., Lugosi, G.: On prediction of individual sequences. Ann. Stat. 27(6), 1865–1895 (1999)
Dudley, R.M.: Central limit theorems for empirical measures. Ann. Probab. 6, 899–929 (1978)
Haussler, D., Kivinen, J., Warmuth, M.K.: Sequential prediction of individual sequences under general loss functions. IEEE Trans. on Information Theory 44(5), 1906–1925 (1998)
Juditsky, A., Rigollet, P., Tsybakov, A.B.: Learning by mirror averaging (2005), available from arxiv website
Kivinen, J., Warmuth, M.K.: Averaging expert predictions. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS, vol. 1572, pp. 153–167. Springer, Heidelberg (1999)
Merhav, Feder: Universal prediction. IEEE Transactions on Information Theory 44 (1998)
Vapnik, V.: The nature of statistical learning theory, 2nd edn. Springer, Heidelberg (1995)
Vovk, V.G.: Aggregating strategies. In: COLT 1990: Proceedings of the third annual workshop on Computational learning theory, pp. 371–386. Morgan Kaufmann Publishers Inc, San Francisco (1990)
Vovk, V.G.: A game of prediction with expert advice. Journal of Computer and System Sciences, 153–173 (1998)
Yang, Y.: Combining different procedures for adaptive regression. Journal of multivariate analysis 74, 135–161 (2000)
Yaroshinsky, R., El-Yaniv, R., Seiden, S.S.: How to better use expert advice. Mach. Learn. 55(3), 271–309 (2004)
Zhang, T.: Information theoretical upper and lower bounds for statistical estimation. IEEE Transaction on Information Theory (to appear, 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Audibert, JY. (2006). A Randomized Online Learning Algorithm for Better Variance Control. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_30
Download citation
DOI: https://doi.org/10.1007/11776420_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35294-5
Online ISBN: 978-3-540-35296-9
eBook Packages: Computer ScienceComputer Science (R0)