Intent recognition; Inverse optimal control; Plan recognition
Definition
Inverse reinforcement learning (inverse RL) considers the problem of extracting a reward function from observed (nearly) optimal behavior of an expert acting in an environment.
Motivation and Background
The motivation for inverse RL is twofold:
For many RL applications, it is difficult to write down an explicit reward function specifying how different desiderata should be traded off exactly. In fact, engineers often spend significant effort tweaking the reward function such that the optimal policy corresponds to performing the task they have in mind. For example, consider the task of driving a car well. Various desiderata have to be traded off, such as speed, following distance, lane preference, frequency of lane changes, distance from the curb, etc. Specifying the reward function for the task of driving requires explicitly writing down the trade-off between these features.
Inverse RL...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of ICML, Alberta
Doya K, Sejnowski T (1995) A novel reinforcement model of birdsong vocalization learning. Neural Inf Process Syst 7:101
Montague PR, Dayan P, Person C, Sejnowski TJ (1995) Bee foragin in uncertain environments using predictive hebbian learning. Nature 377(6551):725–728
Pomerleau D (1989) Alvinn: an autonomous land vehicle in a neural network. In: NIPS 1, Denver
Ratliff N, Bagnell J, Zinkevich M (2006) Maximum margin planning. In: Proceedings of ICML, Pittsburgh
Ratliff N, Bradley D, Bagnell J, Chestnutt J (2007) Boosting structured prediction for imitation learning. Neural Inf Process Syst 19:1153–1160
Sammut C, Hurst S, Kedzier D, Michie D (1992) Learning to fly. In: Proceedings of ICML, Aberdeen
Schmajuk NA, Zanutto BS (1997) Escape, avoidance, and imitation. Adapt Behav 6:63–129
Taskar B, Guestrin C, Koller D (2003) Max-margin markov networks. In: Neural information processing systems conference (NIPS03), Vancouver
Touretzky DS, Saksida LM (1997) Operant conditioning in skinnerbots. Adapt Behav 5:219–47
Watkins CJ (1989) Models of delayed reinforcement learning. Ph.D. thesis, Psychology Department, Cambridge University
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Abbeel, P., Ng, A.Y. (2017). Inverse Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_142
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_142
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering