Abstract
We consider algorithms for combining advice from a set of experts. In each trial, the algorithm receives the predictions of the experts and produces its own prediction. A loss function is applied to measure the discrepancy between the predictions and actual observations. The algorithm keeps a weight for each expert. At each trial the weights are first used to help produce the prediction and then updated according to the observed outcome. Our starting point is Vovk’s Aggregating Algorithm, in which the weights have a simple form: the weight of an expert decreases exponentially as a function of the loss incurred by the expert. The prediction of the Aggregating Algorithm is typically a non-linear function of the weights and the experts’ predictions. We analyze here a simplified algorithm in which the weights are as in the original Aggregating Algorithm, but the prediction is simply the weighted average of the experts’ predictions. We show that for a large class of loss functions, even with the simplified prediction rule the additional loss of the algorithm over the loss of the best expert is at most c ln n, where n is the number of experts and c a constant that depends on the loss function. Thus, the bound is of the same form as the known bounds for the Aggregating Algorithm, although the constants here are not quite as good. We use relative entropy to rewrite the bounds in a stronger form and to motivate the update.
Supported by NSF grant CCR 9700201
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Cesa-Bianchi, Y. Freund, D. Haussler, D.P. Helmbold, R.E. Schapire, and M K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, 1997.
Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, August 1997.
Y. Freund, R.E. Schapire, Y. Singer, and M.K. Warmuth. Using and combining predictors that specialize. In Proc. 29th ACM Symposium on Theory of Computing, pages 334–343. ACM, 1997.
D.P. Helmbold, J. Kivinen, and M.K. Warmuth. Worst-case loss bounds for sigmoided linear neurons. In Proc. 1995 Neural Information Processing Conference, pages 309–315. MIT Press, Cambridge, MA, November 1995.
D. Haussler, J. Kivinen, and M.K. Warmuth. Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory, 44(5):1906–1925, September 1998.
J. Kivinen and M.K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–64, January 1997.
N. Littlestone and M.K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.
V. Vovk. Aggregating strategies. In Proc. 3rd Annu. Workshop on Comput. Learning Theory, pages 371–383. Morgan Kaufmann, 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kivinen, J., Warmuth, M.K. (1999). Averaging Expert Predictions. In: Fischer, P., Simon, H.U. (eds) Computational Learning Theory. EuroCOLT 1999. Lecture Notes in Computer Science(), vol 1572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49097-3_13
Download citation
DOI: https://doi.org/10.1007/3-540-49097-3_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65701-9
Online ISBN: 978-3-540-49097-5
eBook Packages: Springer Book Archive