Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

Ortner, Ronald; Maillard, Odalric-Ambrym; Ryabko, Daniil

doi:10.1007/978-3-319-11662-4_11

Ronald Ortner²³,
Odalric-Ambrym Maillard²⁴ &
Daniil Ryabko^25,26

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8776))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

1387 Accesses
1 Citations

Abstract

We consider a reinforcement learning setting introduced in [5] where the learner does not have explicit access to the states of the underlying Markov decision process (MDP). Instead, she has access to several models that map histories of past interactions to states. Here we improve over known regret bounds in this setting, and more importantly generalize to the case where the models given to the learner do not contain a true model resulting in an MDP representation but only approximations of it. We also give improved error bounds for state aggregation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Value Function Approximation

Reinforcement Learning

References

Bartlett, P.L., Tewari, A.: REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs. In: Proc. 25th Conf. on Uncertainty in Artificial Intelligence, UAI 2009, pp. 25–42. AUAI Press (2009)
Google Scholar
Hallak, A., Castro, D.D., Mannor, S.: Model selection in Markovian processes. In: 19th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, KDD 2013, pp. 374–382. ACM (2013)
Google Scholar
Jaksch, T., Ortner, R., Auer, P.: Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 11, 1563–1600 (2010)
MathSciNet MATH Google Scholar
Littman, M., Sutton, R., Singh, S.: Predictive representations of state. Adv. Neural Inf. Process. Syst. 15, 1555–1561 (2002)
Google Scholar
Hutter, M.: Feature Reinforcement Learning: Part I: Unstructured MDPs. J. Artificial General Intelligence 1, 3–24 (2009)
Article Google Scholar
Maillard, O.A., Nguyen, P., Ortner, R., Ryabko, D.: Optimal regret bounds for selecting the state representation in reinforcement learning. In: Proc. 30th Int’l Conf. on Machine Learning, ICML 2013. JMLR Proc., vol. 28, pp. 543–551 (2013)
Google Scholar
Nguyen, P., Maillard, O.A., Ryabko, D., Ortner, R.: Competing with an infinite set of models in reinforcement learning. In: Proc. 16th Int’l Conf. on Artificial Intelligence and Statistics, AISTATS 2013. JMLR Proc., vol. 31, pp. 463–471 (2013)
Google Scholar
Ortner, R.: Pseudometrics for state aggregation in average reward markov decision processes. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds.) ALT 2007. LNCS (LNAI), vol. 4754, pp. 373–387. Springer, Heidelberg (2007)
Chapter Google Scholar
Ortner, R., Maillard, O.A., Ryabko, D.: Selecting Near-Optimal Approximate State Representations in Reinforcement Learning. Extended version, http://arxiv.org/abs/1405.2652
Ortner, R., Ryabko, D.: Online Regret Bounds for Undiscounted Continuous Reinforcement Learning. Adv. Neural Inf. Process. Syst. 25, 1772–1780 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Montanuniversitaet Leoben, Austria
Ronald Ortner
The Technion, Israel
Odalric-Ambrym Maillard
Inria Lille-Nord Europe, équipe SequeL, France
Daniil Ryabko
Inria, Chile
Daniil Ryabko

Authors

Ronald Ortner
View author publications
You can also search for this author in PubMed Google Scholar
Odalric-Ambrym Maillard
View author publications
You can also search for this author in PubMed Google Scholar
Daniil Ryabko
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Montanuniversitaet Leoben, 8700, Leoben, Austria
Peter Auer
Department of Philosophy, King’s College, WC2R 2LS, London, UK
Alexander Clark
Division of Computer Science, Hokkaido University, N-14, W-9, 060-0814, Sapporo, Japan
Thomas Zeugmann
Department of Computer Science, University of Regina, S4S 0A2, Regina, SK, Canada
Sandra Zilles

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ortner, R., Maillard, OA., Ryabko, D. (2014). Selecting Near-Optimal Approximate State Representations in Reinforcement Learning. In: Auer, P., Clark, A., Zeugmann, T., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2014. Lecture Notes in Computer Science(), vol 8776. Springer, Cham. https://doi.org/10.1007/978-3-319-11662-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-11662-4_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11661-7
Online ISBN: 978-3-319-11662-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Value Function Approximation

Value Function Approximation

Reinforcement Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Value Function Approximation

Value Function Approximation

Reinforcement Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation