Synonyms
Definition
An agent acting in a world makes observations, takes actions, and receives rewards for the actions taken. Given a history of such interactions, the agent must make the next choice of action so as to maximize the long-term sum of rewards. To do this well, an agent may take suboptimal actions which allow it to gather the information necessary to later take optimal or near-optimal actions with respect to maximizing the long-term sum of rewards. These information gathering actions are generally considered exploration actions.
Motivation
Since gathering information about the world generally involves taking suboptimal actions compared with a later learned policy, minimizing the number of information gathering actions helps optimize the standard goal in reinforcement learning. In addition, understanding exploration well is key to understanding reinforcement learning well, since exploration is a key aspect of reinforcement learning which is missing from...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Abbeel P, Ng A (2005) Exploration and apprenticeship learning in reinforcement learning. In: ICML 2005, Bonn
Brafman RI, Tennenholtz M (2002) R-MAX – a general polynomial time algorithm for near-optimal reinforcement learning. J Mach Learn Res 3:213–231
Brunskill E, Leffler BR, Li L, Littman ML, Roy N (2008) CORL: a continuous-state offset-dynamics reinforcement learner. In: UAI-08, Helsinki July 2008
Kakade S (2003) Thesis at gatsby computational neuroscience unit
Kakade S, Kearns M, Langford J (2003) Exploration in metric state spaces. In: ICML 2003, Washington, DC
Kearns M, Koller D (1999) Efficient reinforcement learning in factored MDPs. In: Proceedings of the 16th international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, pp 740–747
Kearns M, Singh S (1998) Near-optimal reinforcement learning in polynomial time. In: ICML 1998. Morgan Kaufmann, San Francisco, pp 260–268
Poupart P, Vlassis N, Hoey J, Regan K (2006) An analytic solution to discrete Bayesian reinforcement learning. In: ICML 2006. ACM Press, New York, pp 697–704
Strehl A (2007) Thesis at Rutgers University
Strehl AL, Li L, Wiewiora E, Langford J, Littman ML (2006) PAC model-free reinforcement learning. In: Proceedings of the 23rd international conference on machine learning (ICML 2006), Pittsburgh, pp 881–888
Watkins C, Dayan P (1992) Q-learning. Mach Learn J 8:279–292
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Langford, J. (2017). Efficient Exploration in Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_244
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_244
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering