Definition
Inverse reinforcement learning (inverse RL) considers the problem of extracting a reward function from observed (nearly) optimal behavior of an expert acting in an environment.
Motivation and Background
The motivation for inverse RL is two fold:
- 1.
For many RL applications, it is difficult to write down an explicit reward function specifying how different desiderata should be traded off exactly. In fact, engineers often spend significant effort tweaking the reward function such that the optimal policy corresponds to performing the task they have in mind. For example, consider the task of driving a car well. Various desiderata have to be traded off, such as speed, following distance, lane preference, frequency of lane changes, distance from the curb, and so on. Specifying the reward function for the task of driving requires explicitly writing down the trade-off between these features.
Inverse RL algorithms...
Recommended Reading
Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Proceedings of ICML, Banff, Alberta, Canada.
Doya, K., & Sejnowski, T. (1995). A novel reinforcement model of birdsong vocalization learning. In Neural Information Processing Systems 7. Cambridge, MA: MITÂ Press.
Montague, P. R., Dayan, P., Person, C., & Sejnowski, T. J. (1995). Bee foraging in uncertain environments using predictive hebbian learning. Nature, 377(6551), 725–728.
Pomerleau, D. (1989). ALVINN: An autonomous land vehicle in a neural network. In NIPS 1. San Francisco, CA: Morgan Kaufmann.
Ratliff, N., Bagnell, J., & Zinkevich, M. (2006). Maximum margin planning. In Proceedings of ICML, Pittsburgh, Pennsylvania.
Ratliff, N., Bradley, D., Bagnell, J., & Chestnutt, J. (2007). Boosting structured prediction for imitation learning. In Neural Information Processing Systems 19. Cambridge, MA: MITÂ Press.
Sammut, C., Hurst, S., Kedzier, D., & Michie, D. (1992). Learning to fly. In Proceedings of ICML. Aberdeen, Scotland, UK.
Schmajuk, N. A., & Zanutto, B. S. (1997). Escape, avoidance, and imitation. Adaptive Behavior, 6, 63–129.
Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In Neural Information Processing Systems Conference (NIPS03), Vancouver, Canada.
Touretzky, D. S., & Saksida, L. M. (1997). Operant conditioning in skinnerbots. Adaptive Behavior, 5, 219–247.
Watkins, C. J. (1989). Models of delayed reinforcement learning. PhD thesis, Psychology Department, Cambridge University.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Abbeel, P., Ng, A.Y. (2011). Inverse Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_417
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_417
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering