Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

Dyagilev, Kirill; Mannor, Shie; Shimkin, Nahum

doi:10.1007/978-3-540-89722-4_4

Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

Kirill Dyagilev³,
Shie Mannor⁴ &
Nahum Shimkin³

Conference paper

1114 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5323))

Abstract

We consider reinforcement learning in the parameterized setup, where the model is known to belong to a finite set of Markov Decision Processes (MDPs) under the discounted return criteria. We propose an on-line algorithm for learning in such parameterized models, the Parameter Elimination (PEL) algorithm, and analyze its performance in terms of the total mistake bound criterion. The algorithm relies on Wald’s sequential probability ratio test to eliminate unlikely parameters, and uses an optimistic policy for effective exploration. We establish that, with high probability, the total mistake bound for the algorithm is linear (up to a logarithmic term) in the size of the parameter space, independently of the cardinality of the state and action spaces. We further demonstrate that much better dependence on is possible, depending on the specific information structure of the problem.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brafman, R.I., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. JMLR 3, 213–231 (2002)
MathSciNet MATH Google Scholar
Dyagilev, K., Mannor, S., Shimkin, N.: Efficient reinforcement learning in parameterized models. Technical report, Technion (2008), http://www.ee.technion.ac.il/people/shimkin/PREPRINTS/PEL_full.pdf
Kailath, T.: The divergence and bhattacharyya distance measures in signal selection. IEEE Transactions of Communication Technology com-15(1), 52–60 (1967)
Article Google Scholar
Kakade, S.M.: On the Sample Complexity of Reinforcement Learning. Ph.D thesis, University College London (2003)
Google Scholar
Kearns, M.J., Koller, D.: Efficient reinforcement learning in factored MDPs. In: IJCAI, pp. 740–747 (1999)
Google Scholar
Kearns, M.J., Singh, S.P.: Near-optimal reinforcement learning in polynomial time. JMLR 49, 209–232 (2002)
MATH Google Scholar
Kumar, P.R., Varaiya, P.: Stochastic Systems: Estimation, Identification and Adaptive Control. The MIT Press, Cambridge (1998)
MATH Google Scholar
Mannor, S., Tsitsiklis, J.N.: The sample complexity of exploration in the multi-armed bandit problem. JMLR 5, 623–648 (2004)
MathSciNet MATH Google Scholar
Puterman, M.L.: Markov Decision Processes. Discrete Stochastic Programming. Wiley, Chichester (1994)
Book MATH Google Scholar
Ryabko, D., Hutter, M.: Asymptotic learnability of reinforcement problems with arbitrary dependence. In: Balcázar, J.L., Long, P.M., Stephan, F. (eds.) ALT 2006. LNCS, vol. 4264, pp. 334–347. Springer, Heidelberg (2006)
Chapter Google Scholar
Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: \(\text{PAC}\) model-free reinforcement learning. In: Proceedings of the ICML 2006 (2006)
Google Scholar
Strehl, A.L., Littman, M.L.: A theoretical analysis of model-based interval estimation. In: Proceedings of ICML 2005, pp. 857–864 (2005)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)
Google Scholar
Wald, A.: Sequential Analysis. Wiley, Chichester (1952)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of EE, Technion, Haifa, Israel
Kirill Dyagilev & Nahum Shimkin
Department of ECE, McGill University, Montreal, Canada
Shie Mannor

Authors

Kirill Dyagilev
View author publications
You can also search for this author in PubMed Google Scholar
Shie Mannor
View author publications
You can also search for this author in PubMed Google Scholar
Nahum Shimkin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA Lille-Nord Europe, 59650, Villeneuve d’Ascq, France
Sertan Girgin
INRIA, LIFL, CNRS, Université de Lille, Villeneuve d’Ascq, France
Manuel Loth , Rémi Munos , Philippe Preux & Daniil Ryabko , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dyagilev, K., Mannor, S., Shimkin, N. (2008). Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-89722-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89721-7
Online ISBN: 978-3-540-89722-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics