Skip to main content

Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5323))

Abstract

We consider reinforcement learning in the parameterized setup, where the model is known to belong to a finite set of Markov Decision Processes (MDPs) under the discounted return criteria. We propose an on-line algorithm for learning in such parameterized models, the Parameter Elimination (PEL) algorithm, and analyze its performance in terms of the total mistake bound criterion. The algorithm relies on Wald’s sequential probability ratio test to eliminate unlikely parameters, and uses an optimistic policy for effective exploration. We establish that, with high probability, the total mistake bound for the algorithm is linear (up to a logarithmic term) in the size of the parameter space, independently of the cardinality of the state and action spaces. We further demonstrate that much better dependence on is possible, depending on the specific information structure of the problem.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brafman, R.I., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. JMLR 3, 213–231 (2002)

    MathSciNet  MATH  Google Scholar 

  2. Dyagilev, K., Mannor, S., Shimkin, N.: Efficient reinforcement learning in parameterized models. Technical report, Technion (2008), http://www.ee.technion.ac.il/people/shimkin/PREPRINTS/PEL_full.pdf

  3. Kailath, T.: The divergence and bhattacharyya distance measures in signal selection. IEEE Transactions of Communication Technology com-15(1), 52–60 (1967)

    Article  Google Scholar 

  4. Kakade, S.M.: On the Sample Complexity of Reinforcement Learning. Ph.D thesis, University College London (2003)

    Google Scholar 

  5. Kearns, M.J., Koller, D.: Efficient reinforcement learning in factored MDPs. In: IJCAI, pp. 740–747 (1999)

    Google Scholar 

  6. Kearns, M.J., Singh, S.P.: Near-optimal reinforcement learning in polynomial time. JMLR 49, 209–232 (2002)

    MATH  Google Scholar 

  7. Kumar, P.R., Varaiya, P.: Stochastic Systems: Estimation, Identification and Adaptive Control. The MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  8. Mannor, S., Tsitsiklis, J.N.: The sample complexity of exploration in the multi-armed bandit problem. JMLR 5, 623–648 (2004)

    MathSciNet  MATH  Google Scholar 

  9. Puterman, M.L.: Markov Decision Processes. Discrete Stochastic Programming. Wiley, Chichester (1994)

    Book  MATH  Google Scholar 

  10. Ryabko, D., Hutter, M.: Asymptotic learnability of reinforcement problems with arbitrary dependence. In: Balcázar, J.L., Long, P.M., Stephan, F. (eds.) ALT 2006. LNCS, vol. 4264, pp. 334–347. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: \(\text{PAC}\) model-free reinforcement learning. In: Proceedings of the ICML 2006 (2006)

    Google Scholar 

  12. Strehl, A.L., Littman, M.L.: A theoretical analysis of model-based interval estimation. In: Proceedings of ICML 2005, pp. 857–864 (2005)

    Google Scholar 

  13. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)

    Google Scholar 

  14. Wald, A.: Sequential Analysis. Wiley, Chichester (1952)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dyagilev, K., Mannor, S., Shimkin, N. (2008). Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89722-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89721-7

  • Online ISBN: 978-3-540-89722-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics