Abstract
We study the online decision problem in which there are T steps to play and n actions to choose. For this problem, several algorithms achieve an optimal regret of \(O(\sqrt{T \ln n})\), but they all require about T n states, which one may not be able to afford when n and T are very large. We are interested in such large scale problems, and we would like to understand what an online algorithm can achieve with only a bounded number of states. We provide two algorithms, both with m n − 1 states, for a parameter m, which achieve regret of O(m + (T/m)ln (mn)) and \(O(n \sqrt{m} +T/\sqrt{m})\), respectively. We also show that any online algorithm with m n − 1 states must suffer a regret of Ω(T/m), which is close to what our algorithms achieve.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abernethy, J., Agarwal, A., Bartlett, P.L., Rakhlin, A.: A stochastic view of optimal regret through minimax duality. In: Proceedings of the 22nd Annual Conference on Learning Theory (2009)
Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta algorithm and applications (2005) (manuscript)
Ben-David, S., Pal, D., Shalev-Shwartz, S.: Agnostic online learning. In: Proceedings of the 22nd Annual Conference on Learning Theory (2009)
Blum, A., Mansour, Y.: Learning, regret minimization, and equilibria. In: Algorithmic Game Theory. Cambridge University Press, New York (2007)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)
Dar, R., Feder, M.: Finite-memory universal prediction of individual continuous sequences. CoRR abs/1102.2836 (2011)
Even-Dar, E., Kleinberg, R., Mannor, S., Mansour, Y.: Online learning for global cost functions. In: Proceedings of the 22nd Annual Conference on Learning Theory (2009)
Freund, Y., Schapire, R.: A decision theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Freund, Y., Schapire, R.: Adaptive game playing using multiplicative weights. Games and Economic Behavior 29, 79–103 (1999)
Hazan, E., Kale, S.: Extracting certainty from uncertainty: regret bounded by variation in costs. In: Proceedings of the 21st Annual Conference on Learning Theory, pp. 57–68 (2008)
Littlestone, N., Warmuth, M.: The weighted majority algorithm. Information and Computation 108(2), 212–261 (1994)
Meron, E., Feder, M.: Finite-memory universal prediction of individual sequences. IEEE Transactions on Information Theory 50(7), 1506–1523 (2004)
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning, pp. 928–936 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, CJ., Lu, WF. (2011). Making Online Decisions with Bounded Memory. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2011. Lecture Notes in Computer Science(), vol 6925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24412-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-24412-4_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24411-7
Online ISBN: 978-3-642-24412-4
eBook Packages: Computer ScienceComputer Science (R0)